How to get staged diffs for commit? - gitpython

I want to obtain a list of differences that are staged for commit (basically the equivalent of "git diff --cached").
I'm using gitpython. I have found that I can get a list of staged files easily enough, but as soon as I request generation of a diff, the list becomes empty.
#!/usr/bin/env python3
from git import Repo
myrepo = Repo() # current directory
staged_files = myrepo.index.diff(myrepo.head.commit, create_patch=False)
print(staged_files)
staged_blobs = myrepo.index.diff(myrepo.head.commit, create_patch=True)
print(staged_blobs)
What I get is this:
[<git.diff.Diff object at 0x7f52753c7710>, <git.diff.Diff object at 0x7f527538f200>]
[]
Namely, the first call gives me a list of Diff objects, with each entry corresponding to one of my staged files, but the second call gives me an empty list.
Why does the second call not give me the same list as the first, but with difference information incorporated?
Is there some other approach I'm supposed to use to obtain this information?

It's a bug -- I found a writeup, and the workaround, in GitPython issue 852.
The workaround is to add argument R=True to the argument list when create_patch=True and the first argument is not None; the details are in the analysis.

Related

How to stage specific lines to git

I want to be able stage specific lines of code that match a pattern (MARKETING_VERSION in my case).
I've got the awk command which will show me the lines that match the pattern MARKETING_VERSION but I don't know how to stage the lines from that result to git.
awk '/MARKETING_VERSION/{print NR}' exampleFile.txt
the result in terminal is
1191
1245
How can I use this result to stage those specific lines in that file to git?
I know you can use git add -p but I want to use this in a shell script so I need a non-interactive version.
TIA
What git add -p <file> does is, very roughly, this:
tmpfile=$(mktemp)
tf2=$(mktemp)
tf3=$(mktemp)
git diff <file> > $tmpfile
while [ -s $tmpfile ]; do
extract first diff hunk from $tmpfile to $tf2 and rest to $tf3
show you $tf2, ask if you want to include this hunk
(with options to edit the hunk, etc); repeat until ready
if you say to *add* the hunk, run git apply --cached $tf2
cat < $tf3 > $tf2
done
rm -f $tmpfile $tf2 $tf3
That is, git add -p uses git apply --cached (a specialized sub-variant of git apply --index that ignores the working tree copy of the file). The key takeaway you need, from the above, is this: There are three versions of the file!
The first one (completely ignored here) is frozen for all time and is in the HEAD commit.
The second one is in Git's index aka staging area. That's used by git diff above as the "old version".
The third one is in your working tree. That's used by git diff above as the "new version".
The patches that Git lets you take or skip are simply the result of comparing the "old" (index) and "new" (working tree) version. If you take some patch, Git updates the in-index copy by applying the patch.
Hence, if there are some set of lines in the working tree version (say, lines 100 through 110 inclusive) that you'd like to use to replace some other set of lines (say, lines 90 through 92 inclusive) in the index version, the way to construct that is:
extract the index version;
scrape out lines 1-89 from the index version; concatenate lines 100-110 from the working tree version; concatenate lines 93-end from the index version, all into a temporary file;
replace the index copy with the temporary file.
To read the index version, use git show or git cat-file -p with the name of the index version of the file. If the file's name is path/to/file, the index version's name is :path/to/file (short for :0:path/to/file: we want the copy in slot zero; there must not be a copy in slots 1, 2, or 3 so that there is a copy in slot 0; you can simply attempt to read it from slot zero, and if that fails, assume the file either isn't in the index, or is conflicted).
Reading the working tree file (some select subset of lines) is left as an exercise, as is the concatenation part, and any error checking you wish to include.
Assuming the final resulting file is in a temporary file named $tf (as a shell variable), to update the index copy, you must first make sure an appropriate blob hash ID exists:
hash=$(git hash-object -w -t blob --path="$path" -- "$tf")
for instance (this assumes you want to run the usual .gitattribute filters, if any, and know that the path is $path). Then, if that goes well, use that hash ID with git update-index:
git update-index --cacheinfo "$mode,$hash,$path"
where $mode is either 100644 or 100755 as appropriate for the file. If you don't want to change the mode, you can read the previous mode with git ls-files --cached or similar. Otherwise, provided core.fileMode is true, read the mode from the working tree copy of the file, to match the behavior of git add: convert "has any executable bit set" to 100755 and "has no executable bit set" to 100644. When core.fileMode is false—use git config --get --type bool core.filemode to read it—git add uses the existing mode for this add-patch case.)

Detect diff to specific yaml file field between two branches with Git and Bash

I have a yaml file that has a field data.version which I want to detect changes from main branch.\
The yaml looks something like this:
# ...
data:
version: 1.2.3
# ...
There are more fields which are not relevant for this purpose.
I am writing a GitLab-CI script where I have my current commit checked out.
I am able to see the changes in general by using this command:\
git fetch origin main
git diff origin/main HEAD -- my_yaml_file
But this does not allow me to detect changes to this specific field...
Is there a way to get and parse the original file from main branch?
Note that I am trying to avoid checking out the entire repository on a temp directory just for that purpose :)
You can get a specific version of a file with git show
git show origin/main:my_yaml_file
After that you need to parse the yaml file to get the diff
For example using yq
git show origin/main:my_yaml_file|yq eval ".data.version"
Will give out the value of data.version

How to only list .png files that have been modified in Git

How would one go about only listing the png files that have been modified in the current branch on Git?
My goal is to copy those files to a different directory (I need to send an email).
Suppose I have:
$ git status
On branch update_assessment_pt1
Your branch is up-to-date with 'upstream/devel'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: assessment/LWR/validation/HbepR1/analysis/hbepr1_plot.py
deleted: assessment/LWR/validation/HbepR1/doc/figures/AxialPowerProfile.pdf
deleted: assessment/LWR/validation/HbepR1/doc/figures/AxialProfile.pdf
deleted: assessment/LWR/validation/HbepR1/doc/figures/CladDisp.pdf
deleted: assessment/LWR/validation/HbepR1/doc/figures/FissionGas.pdf
modified: assessment/LWR/validation/HbepR1/doc/figures/FissionGas.png
deleted: assessment/LWR/validation/HbepR1/doc/figures/InterGasPress.pdf
deleted: assessment/LWR/validation/HbepR1/doc/figures/Mesh.pdf
deleted: assessment/LWR/validation/HbepR1/doc/figures/Power.pdf
modified: assessment/LWR/validation/HbepR1/doc/figures/Power.png
new file: assessment/LWR/validation/IFA_431/analysis/ifa431_plot.py
modified: assessment/LWR/validation/IFA_431/doc/figures/431_bol_rod_power.png
modified: assessment/LWR/validation/IFA_431/doc/figures/431r1.png
modified: assessment/LWR/validation/IFA_431/doc/figures/431r2.png
modified: assessment/LWR/validation/IFA_431/doc/figures/431r3.png
How would I go about getting the following, so I can copy those files?
modified: assessment/LWR/validation/HbepR1/doc/figures/FissionGas.png
modified: assessment/LWR/validation/HbepR1/doc/figures/Power.png
modified: assessment/LWR/validation/IFA_431/doc/figures/431_bol_rod_power.png
modified: assessment/LWR/validation/IFA_431/doc/figures/431r1.png
modified: assessment/LWR/validation/IFA_431/doc/figures/431r2.png
modified: assessment/LWR/validation/IFA_431/doc/figures/431r3.png
Use git diff --cached --diff-filter=M --name-only to obtain these file names. Add -- '*.png' if needed to keep the list filtered to just *.png files—the command will list any to be committed file whose status is M (modified).
Things to know to keep this from just being a "use this magic command" answer
In text, you first called these modified in the current branch. This phrase doesn't mean any one specific thing. Fortunately you then went on to show git status output, where they were listed under Changes to be committed.
Git doesn't store diffs at all. Git stores snapshots—whole files, intact, inside the main unit of storage, which is the commit. That means that in order to see a change, you have to pick two commits: $old and $new. Git will extract both, then compare them. Whatever is different between commit $old and commit $new, Git will tell you about that. The actual change can be any of a number of change-status-es:
A means Added: the file is not in $old and is in $new.
M means Modified: the file is different between $old and $new. The difference could just be the mode of the file: executable, or not.
D means Deleted: the file is in $old, but not in $new.
R, C, T, and some other rare cases can also occur, though some of them may require extra flags to git diff: you won't see an R status unless you enable rename-detection, for instance. (Rename detection defaults to on in the most modern Git versions, but off in older Git versions.)
Using --name-status, git diff will show you the file names and status letters, instead of showing an actual diff. (Try this out to see.) The --diff-filter argument lets you tell Git: only tell me about files whose status meets the letters I pick.
Note, by the way, that the special name HEAD always means the current commit. It does not matter how you made this commit become the current commit, though one typical way is by using git checkout: you git checkout a commit by its hash ID, for instance, and that commit is now checked out and is the current commit. Or, you git checkout a branch name, and the tip commit of that branch is now out and is the current commit. There is always1 a current commit, and you can name it by writing the name HEAD in all uppercase.2
All of the above talks about comparing commits, but there are two other places that files can exist, that are not commits. Note that both of these places are temporary: they get wiped out by various operations, and once wiped out, cannot be recovered in Git: you have to copy from these temporary places, into actual commits, to make the files permanent. Once the files are in commits, they're frozen for all time, and can be restored to useful form in the future for as long as the commit itself exists (which tends to be "forever", or as long as the repository exists).
These two places are:
the index, which Git also calls the staging area or (rarely) the cache, and
the work-tree or working tree or any of several variants on this name.
Files that are in the index right now are ready to be committed. Every file that will be committed is in the index right now, even if the index copy matches the current (HEAD) commit copy.
You can, at any time, compare the HEAD commit to whatever is in the index right now. One command that does this is git diff --cached. For every file in HEAD and/or in the index, Git compares the two copies of the file. If they are different, the file is modified. If the index file exists but there is no such file in HEAD, the file is added. If the file exists in HEAD but not in the index, the file is deleted.
You can also, at any time, compare HEAD to the work-tree, or the index to the work-tree. The commands that do this are git diff HEAD and git diff (with no name). Again, for every file on the left-hand side (HEAD or the index), and every file on the right-hand side (in the work-tree), Git compares the two copies of the file.
Last, note that git status runs two git diffs. It does a quick git diff --cached to compare HEAD vs index. Whatever is different here, git status lists that file as to be committed. It also does a quick git diff (with no extra arguments except for --name-only) to compare index vs work-tree. Whatever is different here, git status lists that file as changes not staged for commit.
You wanted to compare HEAD vs index, so you want git diff --cached. You then wanted to list only those files that are Modified, so you can add --diff-filter=M. You didn't want to see the actual differences—nor even the status letters; file names only please!—so you can add --name-only. You also wanted only to list files whose name matches *.png, so add -- '*.png'—the quotes protect the * from the shell; we want Git to see the * so that Git can treat it as a pathspec—to get just those.
1Actually, this is really almost always. There's a special state in which HEAD exists and contains a branch name, but the branch name itself doesn't exist. This state mostly occurs when you create a new, totally-empty repository. Git requires a branch name like master to identify some existing, valid commit hash ID. There are no commits, so there are no valid hash IDs, so master itself is not allowed to exist. Nonetheless, HEAD holds the name master, so that Git will create the master branch when you make the first commit.
2On Windows and MacOS, you can sometimes get away with using head (lowercase) instead of HEAD (all-uppercase). This misbehaves if you start using git worktree add, so it's a bad habit to get into. If you don't like typing HEAD in all capitals, consider using the symbol #, which is a synonym for HEAD.

GitHub Branches: Case-Sensitivity Issue?

I seem to be having an issue with a repository continually recreating branches locally because of some branches on remote. I'm on a Windows machine, so I suspect that it's a case sensitivity issue.
Here's an example couple commands:
$ git pull
From https://github.com/{my-repo}
* [new branch] Abc -> origin/Abc
* [new branch] Def -> origin/Def
Already up to date.
$ git pull -p
From https://github.com/{my-repo}
- [deleted] (none) -> origin/abc
- [deleted] (none) -> origin/def
* [new branch] Abc -> origin/Abc
* [new branch] Def -> origin/Def
Already up to date.
When doing a git pull, the branches in question are capitalized. When I do a git pull -p (for pruning), it first tries to delete lowercased versions of the branches, then create capitalized versions.
The remote branches are capitalized (origin/Abc and origin/Def).
I have tried to temporarily change my Git config such that ignorecase=false (it is currently ignorecase=true). But I noticed no change in behavior. I'm guessing there's something local on my end that's currently holding onto those lowercased branches. But git branch does not show any version of these branches locally.
Short of completely obliterating the repository (a fresh git clone in a separate folder does not pull these phantom branches when trying pulls/fetches), is there anything I can do?
Git is schizophrenic about this.1 Parts of Git are case-sensitive, so that branch HELLO and branch hello are different branches. Other parts of Git are, on Windows and MacOS anyway, case-insensitive, so that branch HELLO and branch hello are the same branch.
The result is confusion. The situation is best simply avoided entirely.
To correct the problem:
Set some additional, private and temporary, branch or tag name(s) that you won't find confusing, to remember any commit hash IDs you really care about, in your own local repository. Then run git pack-refs --all so that all your references are packed. This removes all the file names, putting all your references into the .git/packed-refs flat-file, where their names are case-sensitive. Your Git can now tell your Abc from your abc, if you have both.
Now that your repository is de-confused, delete any bad branch names. Your temporary names hold the values you want to remember. You can delete both abc and Abc if one or both might be messed up. Your remember-abc has the correct hash in it.
Go to the Linux server machine that has the branches that differ only in case from yours. (It's always a Linux machine; this problem never occurs on Windows or MacOS servers because they do the case-folding early enough that you never create the problem in the first place.) There, rename or delete the offending bad names.
The Linux machine has no issues with case—branches whose name differs only in case are always different—so there is no weirdness here. It may take a few steps, and a few git branch commands to list all the names, but eventually, you'll have nothing but clear and distinct names: there will be no branches named Abc and abc both.
If there are no such problems on the Linux server, step 2 is "do nothing".
Use git fetch --prune on your local system. You now no longer have any bad names as remote-tracking names, because in step 2, you made sure that the server—the system your local Git calls origin—has no bad names, and your local Git has made your local origin/* names match their branch names.
Now re-create any branch names you want locally, and/or rename the temporary names you made in step 1. For instance if you made remember-abc to remember abc, you can just run git branch -m remember-abc abc to move remember-abc to abc.
If abc should have origin/abc set as its upstream, do that now:
git branch --set-upstream-to=origin/abc abc
(You can do this in step 1 when you create remember-abc, but I think it makes more sense here so I put it in step 4.)
There are various shortcuts you can use, instead of the 4 steps above. I listed all four this way for clarity of purpose: it should be obvious to you what each step is intended to accomplish and, if you read the rest of this, why you are doing that step.
The reason the problem occurs is outlined in nowox's answer: Git sometimes store the branch name in a file name, and sometimes stores it as a string in a data file. Since Windows (and MacOS) tends to use file-name-conflation, the file-name variant retains its original case, but ignores attempts to create a second file of the other case-variant name, and then Git thinks that Abc and abc are otherwise the same. The data-in-a-file variant retains the case-distinction as well as the value-distinction and believes that Abc and abc are two different branches that identify two different commits.
When git rev-parse refs/heads/abc or git rev-parse refs/remotes/origin/abc gets its information from .git/packed-refs—a data file containing strings—it gets the "right" information. But when it gets its information from the file system, an attempt to open .git/refs/heads/abc or .git/refs/remotes/origin/abc actually opens .git/refs/heads/Abc (if that file exists right now) or the similarly-named remote-tracking variant (if that file exists), and Git gets the "wrong" information.
Setting core.ignorecase (to anything) does not help at all as this affects only the way that Git deals with case-folding in the work-tree. Files inside Git's internal databases are not affected in any way.
This whole problem would never come up if, e.g., Git used a real database to store its <reference-name, hash-ID> table. Using individual files works fine on Linux. It does not work fine on Windows and MacOS, not this way anyway. Using individual files could work there if Git didn't store them in files with readable names—for instance, instead of refs/heads/master, perhaps Git could use a file named refs/heads/6d6173746572, though that halves the available component-name length. (Exercise: how is 0x6d m, 0x61 a, and so on?)
1Technically, this is the wrong word. It's sure descriptive though. A better word might be schizoid, as used in the title of one episode of The Prisoner, but it too has the wrong meaning. The root word here is really schism, meaning split and somewhat self-opposed, and that's what we're driving at here.
On Git, branches are just pointers to a commit. The branches are stores as plain files on your .git repository.
For instance you may have abc and def files on .git/refs/heads.
$ tree .git/refs/heads/
.git/refs/heads/
├── abc
├── def
└── master
The content of these files is just the commit number on which the branch is pointing.
I am not sure, but I think the option ignorecase is only relevant to your working directory, not the .git folder. So to remove the weird capitalized branches, you may just need to remove/rename the files in .git/refs/heads.
In addition to this, the upstream link from a local branch to a remote branch is stored on the .git/config file. In this file you may have something like:
[branch "Abc"]
remote = origin
merge = refs/heads/abc
Notice in this example that the remote branch is named Abc but the local branch is abc (lowercase).
To solve your issue I would try to:
Modify the .git/config file
Rename the corrupted branches in .git/refs/heads such as abc is renamed abc-old
Try your git pull
The answers supplied by nowox and torek were very helpful, but did not contain the exact solution. The existing references to remote in .git/config, and the files in git/refs/heads did not contain any versions of abc or def.
Instead, the problem existed in .git/refs/remotes/origin.
My .git/refs/remotes/origin directory had references to the lowercased versions of these feature branch folders. Some feature branches were made under abc and def using the lowercased versions, but they no longer exist on remote. The creator of these feature branches recently switched to using Abc and Def on remote. I deleted .git/refs/remotes/origin/abc and .git/refs/remotes/origin/def then executed fresh git pull -p commands. New folders, Abc and Def, were created, and subsequent pulls or fetches correctly display Already up to date.
Thanks to nowox and torek for getting me on the right track!
I did the following to solve my problem:
I navigated to the .git/refs/remotes/origin folder.
I deleted the folder with the buggy branch name.
I did git pull in the terminal.
I met the similar question today. I did the following to solve my problem:
rename the 2nd branch to another name
rename the 1st branch to 2nd_branch_old_name
git push origin 1st_branch_new_name

Git how to get hashid of last GitHub release tag

Every 10-20 commits I have a commit of the form "version uped to x.y.z" that is also marked a tag, since its a GitHub release point. Example below. I need to get hashid of last such commit so I can use it in a script like "git rebase -i $(hashid)", which is a point of freezing, where existing commits should not changed. There are 2 possible means to get it: search for last commit with message starting with "version uped" or search for last commit being a tag. I am not skilled with bash, so please assist.
dfd48cd (HEAD -> master, origin/master, origin/HEAD) Operator [:] for GreedyRange removed
b610256 Array GreedyRange docs updated
e6a1446 Embedded docstring updated
825bf83 moved gallery and deprecated_gallery to new folders
9414a55 Kaitai comparison schemas moved to a folder
61e9ccb Padded fixed, negative length check and docstring
ad6148c FixedSized updated, changed build semantics
979538d FixedSized NullTerminated NullStripped fixed, _parsereport and docstrings
4719d67 lib/py3compat updated, supportsintflag supportsintflag more accurate
9c164d4 makefile added xfails profile
672fefa (tag: v2.9.40) version uped to 2.9.40
In this example, output would be: 672fefa52b537c17f5ede90996b9156eb0e040ac
Here is one way:
git tag --sort -v:refname | head -1 | xargs git rev-list -n 1
Explained:
Get the list of all tags, sorted descending by version number
Pick the first one (the one with the highest version number)
Pass it to git rev-list to find the commit hash it references

Resources