Running git diff-tree with --numstat and --name-status - git-diff

I'm writing a script to analyze changes have been made into a git repo.
At some point I need to iterate over all the commits and obtain these information about each of them:
Commit ID
Date
Commit Message
...
Files changed
File Name
Type of change (Added/Modified/Removed/Renamed)
New File Name (in case the change type is "Renamed")
Number of lines added
Number of lines removed
I get the commit messages and dates by git log. The issue I have is with the files.
If I don't want to collect number of lines added/removed, I'd simply use
git diff-tree --no-commit-id --name-status -M -r abcd12345
The output would be something like
A Readme.md
M src/something.js
D src/somethingelse.js
R100 tests/a/file.js tests/b/file.js
Which I can parse and read programmatically.
To get information about lines added/removed, I could use this:
git diff-tree -M -r --numstat abcd12345
The output would be like:
abcd12345
82 0 Readme.md
41 98 src/something.js
0 64 src/somethingelse.js
0 0 tests/{a => b}/file.js
Which is not that machine readable for renamed files.
My question is: Is there any way to combine these two commands? It seems I can't use --numstat with --name-status.
I can run two separate command and merge the result in my script as well. In that case, is there any other switches that I can use to make the result of the second command more machine readable?
Thanks.

I think your analysis (that you need two separate commands) is correct. Use -z to obtain machine-readable output with --numstat (this disables both fancy rename encoding and all special-character-quoting), but note that you will then have to break lines apart at ASCII NULs instead of newlines.

Related

Find Path to Files containing specific word - How can I know which file in a directory contain the specific word that I want?

I manage to find the word that I desire with the following command in a github repository
git log --all -p | grep 'abc'
abc is a word located in a specific file.
My question is how can I find the file path to the string that I desire? How can I know which file contain the word that I want which is abc ?
For example, doing the above command would get me
(this.b(),this.abc);
but I would like to know which exact folder and which exact file is this piece of string/code coming from.
Any suggestion is appreciated.
git grep -l 'abc'
Option -l list file names where the pattern is found in the current (HEAD) commit.
Just run
git log --all -p
This shows the output in the pager, usually less. You can search, scroll forward and backward.
To search the string, type / a b c Enter (the abc here is a regular expression, not a literal string). Type n to find the next occurrence. When you have found one, you can scroll back with b and look at the patch text to see which file it is. (BTW, type Space to scroll forward; type q to exit the pager.)
The following (tested with GNU awk) should be an approximation of what you want:
git log --all -p |
awk '/^diff --git / {files = $0}
/^## /,/^(commit |diff --git )/ {if(index($0, "abc")) print files}'
We store the diff --git line in variable files. Then, if your string is found between the following line starting with ## and the line starting with commit or diff --git , the files variable is printed.
It is an approximation only because string abc could also be found in the diff --git or commit lines and also because ^## , ^diff --git or ^commit could be found in file contents.
More accurate solutions exist but they are more complicated and those I can think of cannot be 100% perfect.

Getting tracking information for a Git branch

How can I get tracking information (i.e. remote and branch name) about a specific local Git branch, preferably in one command? There seem to be many ways to do this, e.g.
git rev-parse --abbrev-ref --symbolic-full-name branch_name#{upstream}
However, it returns the upstream in the form 'origin/branch_name', which makes it difficult to figure out the separate parts (e.g. when remote or branch name contains '/'). Is there more reliable solution, preferably using a single Git command?
#RomainValeri in the answer suggested this command to display the tracking information.
git for-each-ref --format="%(upstream:short)" refs/heads/<yourBranch>
However, if you want to get rid of the slash then you can do this
git for-each-ref --format="%(upstream:remotename) %(upstream:lstrip=-1)" \
# Insert your separator here ^
refs/heads/<yourBranch>
From git-docs,
upstream
The name of a local ref which can be considered “upstream” from the
displayed ref. Respects :short, :lstrip and :rstrip in the same way as
refname above ...
For any remote-tracking branch %(upstream), %(upstream:remotename) and
%(upstream:remoteref) refer to the name of the remote and the name of
the tracked remote ref, respectively. In other words, the
remote-tracking branch can be updated explicitly and individually by
using the refspec %(upstream:remoteref):%(upstream) to fetch from
%(upstream:remotename).
More on lstrip,
If lstrip= < N > (rstrip= < N >) is appended, strips < N > slash-separated
path components from the front (back) of the refname (e.g.
%(refname:lstrip=2) turns refs/tags/foo into foo and
%(refname:rstrip=2) turns refs/tags/foo into refs). If < N > is a
negative number, strip as many path components as necessary from the
specified end to leave -< N > path components (e.g. %(refname:lstrip=-2)
turns refs/tags/foo into tags/foo and %(refname:rstrip=-1) turns
refs/tags/foo into refs). When the ref does not have enough
components, the result becomes an empty string if stripping with
positive < N >, or it becomes the full refname if stripping with
negative < N >. Neither is an error.
Some examples :
Format : "%(upstream:remotename):%(upstream:lstrip=-1)"
Output : <remote-name>:<branch-name>
Format : "%(upstream:remotename) %(upstream:lstrip=-1)"
Output : <remote-name> <branch-name>
If the branch name includes a slash, then lstrip won't work. Instead remoteref can be used.
git for-each-ref --format="%(upstream:remotename) %(upstream:remoteref)" refs/heads/<yourBranch>
The output is in this format : <remote-name> refs/heads/<branch-name>
To remove refs/heads/ from the output, pipe the above command to this
sed 's/refs\/heads\///g'
I'd use the built-in -v (verbose) or even -vv (very verbose) flag to get this from git branch output. You might also just grep the branch name to focus on what you wanted :
git branch -vv | grep <branchName>
Depending on what exactly you want to get, maybe also consider using the plumbing tool :
git for-each-ref --format="%(upstream:short)" refs/heads/<yourBranch>
and make it an alias for convenience
git config --global alias.get-rem '!f() { git for-each-ref --format="%(upstream:short)" refs/heads/$1; }; f'
# then just
git get-rem branch_name
Edit : For the very short part (i.e. "branch" instead of either "refs/remotes/origin/branch" or even "origin/branch"), you can use %(upstream:lstrip:-1) instead of %(upstream:short)

Filter response from "git diff" command to get only the difference in Shell - Dynamic Solution

I am trying automate a redundant deployment process in my project. In order to achieve that I am trying to get the difference between two branches using "git diff" -- Someway and I am able to achieve that using the following command.
git diff <BRANCH_NAME1> -- common_folder_name/ <BRANCH_NAME2> -- common_folder_name/ > toStoreResponse.txt`
Now the response that I get, looks something like below:
diff --git a/cmc-database/common/readme.txt b/cmc-database/common/readme.txt
index 7820f3d..5a0e484 100644
--- a/cmc-database/common/readme.txt
+++ b/cmc-database/common/readme.txt
## -1 +1,5 ##
-This folder contains common database scripts.
\ No newline at end of file
+This folder contains common database scripts.
+TEST STTESA
\ No newline at end of file
So here in the above response only line/text that is a new line or the difference between the two branches is TEST STTESA and I want to store only that much of text in some different text file using shell / git way.
i.e a file named readme.txt which will only contain TEST STTESA content.
Work around Solution:
I have found a workaround to filter the response - but however it is not 100% what I am looking for. Command looks like below:
git diff <Branch_Name1> -- common-directory/ <Branch_Name2> -- common-directory/ | grep -v common-directory | grep -v index | grep -v # | grep -v \\
The above command returns below response:
-This folder contains common database scripts.
+This folder contains common database scripts.
+TEST STTESA
But I want to be able to store only the difference which is TEST STTESA
As you can easily realize, your solution won't work every time. The grep -v parts make it unportable.
Here is a "step0" solution : You want to match lines that start with a "+" or a "-" and then neither a "+" nor a "-". Use grep for that !
git diff ... | grep "^+[^+]\|^-[^-]"
Some explanation :
First, the \| part in the middle is an "or" statement.
Then, each side starts with a ^ which refers to the beginning of the line. And finally, after the first character, we want to reject some characters, using the [^...] syntax.
The line above translates to English as "Run the diff, and find all the lines that either start with a +, followed by something that is not a +, OR start with a -, followed by something that is not a -.
This will not work properly if you remove a line that started with a -. Nor if you add a line that starts with a +.
For such scenarii, I would tinkle with git diff --color and grep some [32m for the fun.
--diff-filter=[ACDMRTUXB*]
Select only files that are
A Added
C Copied
D Deleted
M Modified
R Renamed
T have their type (mode) changed
U Unmerged
X Unknown
B have had their pairing Broken
and * All-or-none

Replace/sync only certain lines using Bash, SSH and rsync

I am looking for a quick and dirty one-liner to sync only certain settings in remote config files. Need to preserve what's unique and sync generic settings. Example:
Config1.conf:
HOSTNAME=COMP1
IP=10.10.13.10
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
Remote-Config2.txt:
HOSTNAME=COMP2
IP=10.10.13.11
LOCATION=FOO
BUILDING=BAR
ROOM=BAZ
I need to sync or copy replace only the bottom 3 lines over ssh. The line numbers are predictable, by the way. Always lines 4,5 and 6 in this case.
Here's a working idea that is missing one piece (a standard replacement for the non-standard utility I used to replace the vars in the local conf):
for var in $(ssh root#10.10.8.12 'sed -n "4,6p" /etc/conf1.conf');do <missing piece> ${var/=*}=${var/*=} local-conf.conf; done
So this uses variable expansion and a non-standard utility but needs like a sed or Perl routine to replace the info in the local conf.
Update
The last line of code actually works. Tested and works! However -- the missing piece is a custom non-standard utility. I'm asking if someone can think of something, using standard Linux tools, to replace that.
One solution would be to take the left side and match, then replace the right side. This is basically what that utility does. Looks for the variable in the conf then sets it. Using variable expansion is one way (shown).
Here's an alternative solution that does not require the command to have special knowledge of the file contents:
Take a copy of the files you want to sync. Then, in the copy, deliberately vandalise (arbitrarily modify) the lines you do not want synced. It doesn't matter what they say as long as there are the same number of lines and they'll never match the actual file contents. Have some fun. This becomes your base version. Your example might look like this:
HOSTNAME=foo
IP=bar
LOCATION=SITE_A
BUILDING=DEPT_IT
ROOM=COMP_LAB1
rsync the remote files into a temporary location. This is the remote version.
For each file, take a three-way diff.
diff3 -3 <localfile> <basefile> <remotefile>
The output of diff3 is an "ed script" that decribes what edits to make to the local file so that it would look like the remote file.
The -3 option tells it to only output the non-conflicting differences. This is why we vandalised the base files in the first place: so those lines would have conflicts.
Once you have the ed script for a file, you can visually check it, if you choose, and then apply the update using patch:
cat <ed-script> | patch --ed <localfile>
So, to do this recursively, you might have:
cd $localdir
for file in `find . -type f`; do
diff3 -3 "$file" "$basedir/$file" "$remotedir/$file" | patch --ed "$file"
done
You probably need to add some checks that the base and remote files actually exist.

How to compare files with same names in two different directories using a shell script

Before moving on to use SVN, I used to manage my project by simply keeping a /develop/ directory and editing and testing files there, then moving them to the /main/ directory. When I decided to move to SVN, I needed to be sure that the directories were indeed in sync.
So, what is a good way to write a shell script [ bash ] to recursively compare files with the same name in two different directories?
Note: The directory names used above are for sample only. I do not recommend storing your code in the top level :).
The diff command has a -r option to recursively compare directories:
diff -r /develop /main
diff -rqu /develop /main
It will only give you a summary of changes that way :)
If you want to see only new/missing files
diff -rqu /develop /main | grep "^Only
If you want to get them bare:
diff -rqu /develop /main | sed -rn "/^Only/s/^Only in (.+?): /\1/p"
The diff I have available allows recursive differences:
diff -r main develop
But with a shell script:
( cd main ; find . -type f -exec diff {} ../develop/{} ';' )
[I read somewhere that answering your own questions is OK, so here goes :) ]
I tried this, and it worked pretty well
[/]$ cd /develop/
[/develop/]$ find | while read line; do diff -ruN "/main/$line" $line; done |less
You can choose to compare only specific files [e.g., only the .php ones] by editing the above line as
[/]$ cd /develop/
[/develop/]$ find -name "*.php" | while read line; do diff -ruN "/main/$line" $line; done |less
Any other ideas?
here is an example of a (somewhat messy) script of mine, dircompare.sh, which will:
sort files and directories in arrays depending on which directory they occur in (or both), in two recursive passes
The files that occur in both directories, are sorted again in two arrays, depending on if diff -q determines if they differ or not
for those files that diff claims are equal, show and compare timestamps
Hope it can be found useful - Cheers!
EDIT2: (Actually, it works fine with remote files - the problem was unhandled Ctrl-C signal during a diff operation between local and remote file, which can take a while; script now updated with a trap to handle that - however, leaving the previous edit below for reference):
EDIT: ... except it seems to crash my server for a remote ssh directory (which I tried using over ~/.gvfs)... So this is not bash anymore, but an alternative I guess is to use rsync, here's an example:
$ # get example revision 4527 as testdir1
$ svn co https://openbabel.svn.sf.net/svnroot/openbabel/openbabel/trunk/data#4527 testdir1
$ # get earlier example revision 2729 as testdir2
$ svn co https://openbabel.svn.sf.net/svnroot/openbabel/openbabel/trunk/data#2729 testdir2
$ # use rsync to generate a list
$ rsync -ivr --times --cvs-exclude --dry-run testdir1/ testdir2/
sending incremental file list
.d..t...... ./
>f.st...... CMakeLists.txt
>f.st...... MACCS.txt
>f..t...... SMARTS_InteLigand.txt
...
>f.st...... atomtyp.txt
>f+++++++++ babel_povray3.inc
>f.st...... bin2hex.pl
>f.st...... bondtyp.h
>f..t...... bondtyp.txt
...
Note that:
To get the above, you mustn't forget trailing slashes / at the end of directory names in rsync
--dry-run - simulate only, don't update/transfer files
-r - recurse into directories
-v - verbose (but not related to file changes info)
--cvs-exclude - ignore .svn files
-i - "--itemize-changes: output a change-summary for all updates"
Here is a brief excerpt of man rsync that explains the information shown by -i (for instance, the >f.st...... strings above):
The "%i" escape has a cryptic output that is 11 letters long.
The general format is like the string YXcstpoguax, where Y is
replaced by the type of update being done, X is replaced by the
file-type, and the other letters represent attributes that may
be output if they are being modified.
The update types that replace the Y are as follows:
o A < means that a file is being transferred to the remote
host (sent).
o A > means that a file is being transferred to the local
host (received).
o A c means that a local change/creation is occurring for
the item (such as the creation of a directory or the
changing of a symlink, etc.).
...
The file-types that replace the X are: f for a file, a d for a
directory, an L for a symlink, a D for a device, and a S for a
special file (e.g. named sockets and fifos).
The other letters in the string above are the actual letters
that will be output if the associated attribute for the item is
being updated or a "." for no change. Three exceptions to this
are: (1) a newly created item replaces each letter with a "+",
(2) an identical item replaces the dots with spaces, and (3) an
....
A bit cryptic, indeed - but at least it shows basic directory comparison over ssh. Cheers!
The classic (System V Unix) answer would be dircmp dir1 dir2, which was a shell script that would list files found in either dir1 but not dir2 or in dir2 but not dir1 at the start (first page of output, from the pr command, so paginated with headings), followed by a comparison of each common file with an analysis (same, different, directory were the most common results).
This seems to be in the process of vanishing - I have an independent reimplementation of it available if you need it. It's not rocket science (cmp is your friend).

Resources