How can I get a list of first parent commits using git python? - gitpython

I'd like to run the equivalent of this command from git python, but have not been able to figure out a way to accomplish this.
git rev-list --first-parent commit1..HEAD
I'm looking to get the result of that command into an iterable of git python's Commit objects. I tried repo.iter_commits but it doesn't appear to be capable of taking in arguments to rev-list that don't take parameters.
My use case is that "commit1" will be the commit upon which a branch was based, and I'll run this code while the branch is checked out. Thus, this command would give me the list of commits committed to the branch, even in the presence of merge commits from the branch "commit1" is on.
I have also tried
repo.iter_commits('HEAD ^commit1')
but that results in the following error:
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git rev-list HEAD ^commit1 --
stderr: 'fatal: bad revision 'HEAD ^commit1'
However, I can run
git rev-list HEAD ^commit1 --
in bash and it runs fine. And besides, the command doesn't really give me what I need.

I was able to get what I needed by working directly with the commit's parents list in git python. Here is a snippet of what worked for me:
commits = list()
c = repo.head.commit
while (True):
firstparent = c.parents[0]
if (firstparent != commit1):
c = firstparent
commits.append(c)
else:
break
The above code does not handle first commits (which have no parents.)

You can use:
repo.iter_commits(repo.head.commit, first_parent=True)
(I'm working with git python version 3.1.29).

Related

Check if my current branch on local fork exists on remote upstream

git branch -a | egrep 'remotes/upstream/master' tells me if master branch exists on upstream or not. That is good.
Lets say my current branch is named as my_branch which is the output of git rev-parse --abbrev-ref HEAD.
Question
In a bash script, how can I get the output of git rev-parse --abbrev-ref HEAD into a variable and check if that branch exists on upstream or not? I tried the following but it does not work?
#!/bin/bash
git checkout master
export MY_BRANCH=`git rev-parse --abbrev-ref HEAD`
git branch -a | egrep 'remotes/upstream/${MY_BRANCH}$'
# Above last command should output "remotes/upstream/master" but it doesn't like it does for "git branch -a | egrep 'remotes/origin/master'"?
PS:
Should I have to do git fetch --all for this to work correctly synced with upstream?
Your own answer works. There are a number of shortcuts you can use, but before we visit them, let's start with this:
Should I have to do git fetch --all for this to work correctly synced with upstream?
Given that you're querying your own Git about what it remembers about the remote you're calling upstream, it is a good idea to run git fetch upstream. Using git fetch --all is not a bad idea, and you can use that instead if you'd like to fetch from both origin and upstream, so as to update all your origin/* and upstream/* names. Given that you're only querying one of the upstream/* names, though, all you need is an update on those. Indeed, you could run git fetch upstream $MY_BRANCH here, although now we're really getting into hair-splitting.1
Now, as to the shortcuts: git rev-parse does everything you need. You don't need git branch -a and grep. (You definitely don't need egrep: grep was originally very simple, then got complicated-up into grep, fgrep, and egrep. It's been re-simplified in most modern systems, so that when you do need grep, you can just run grep, with whatever interface flags you like for the desired behavior. The program will pick the right algorithm, and run fast on its own.)
git rev-parse --abbrev-ref HEAD: this prints the current branch name, if there is a current branch name. If there is no current branch name, this prints HEAD. That's almost certainly what you want.3
git rev-parse refs/remotes/upstream/$MY_BRANCH: this prints the hash ID corresponding to the name refs/remotes/upstream/$MY_BRANCH. If there is no such name, it prints an error message (see below).
The refs/ in front of remotes/upstream/$MY_BRANCH is simply the way we use the full, un-ambiguous name in Git. Git normally strips off refs/heads/, refs/tags/, refs/remotes/, or refs/ from ref names, all of which start with refs/. This leaves us with master as a branch name, v1.2 as a tag name, and either origin/master or remotes/origin/master as a remote-tracking name.
As you've seen, git branch -a doesn't take off the remotes/ part from refs/remotes/origin/master or refs/remotes/upstream/master. However, if you were to use git branch -r, it would take the remotes/ part off, leaving you with, e.g., origin/master and upstream/master. There is no obvious reason that git branch behaves this way; it just does. By using git rev-parse instead, you avoid having to deal with this particular inconsistency.
1Presumably, the point of splitting hairs here is to make this go as fast / efficiently as possible. A git fetch --all has your Git call up each remote, one at a time,2 and fetch everything from each. So this will take as long as it takes to work its way all the way through every remote, one at a time. Meanwhile, a git fetch upstream calls up just the one remote (upstream) and then fetches everything there is at the one upstream. A git fetch upstream $MY_BRANCH calls up just the one remote and then asks it only for new commits and such on the one branch.
Let's compare the three:
--all dials up every "phone number". There's a certain amount of slack time in looking up the "phone number" (Internet address) and making the "call" to the server there. So this could take a few extra seconds. But once done, everything is updated and you don't have to run git fetch until things could have changed: depending on how busy these servers are, anywhere from days, to seconds.
Using one remote dials one "phone number", then gets all new commits from all branches and updates all the origin/* or upstream/* or whatever names. This takes a little longer—sometimes milliseconds, sometimes seconds—than updating just one. But once done, you'll need to repeat it for any other branch.
Using one remote and one branch name might gain you a few milliseconds, or as much as a few seconds. This is usually much smaller than the gain from avoiding --all. How much difference does it make for your particular cases? You'll have to measure.
That last one is really the bottom line, as it were. Measure, measure, measure. Or, write something convenient to write, and if it works well enough, stop. :-)
2A future Git might be able to call them up in parallel to take advantage of multiple CPUs and so on. This could change the calculus a bit, or even a lot.
3There are several corner cases to consider, as we'll see in the "errors" section below.
Error messages and corner cases
While git rev-parse is great for things that do work, there are things that don't work. For instance, here is what happens if I try to rev-parse a name that doesn't exist:
$ git rev-parse refs/remotes/upstream/foobranch
refs/remotes/upstream/foobranch
fatal: ambiguous argument 'refs/remotes/upstream/foobranch': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
The output here is a bit messy:
The standard output from rev-parse is just refs/remotes/upstream/foobranch. That is, I gave rev-parse a name that does not parse, so it gave that name right back to me.
The stderr output from rev-parse has the fatal: ..., Use '--' ..., and final line.
If all works, we get this:
$ git rev-parse refs/remotes/origin/master
225365fb5195e804274ab569ac3cc4919451dc7f
As before, you might want to save this in a variable:
$ result=`git rev-parse refs/remotes/origin/master`
$
The shell (sh or bash here) collects the standard output and assigns that to the variable ($result here); the standard error output goes through to the terminal:
$ result=`git rev-parse refs/remotes/upstream/foobranch`
fatal: ambiguous argument 'refs/remotes/upstream/foobranch': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
Note that this time, since the standard output went into $result, we did not see refs/remotes/upstream/foobranch echoed.
The backquote construct:
var=`cmd`
is usually better expressed as:
var=$(cmd)
if for no other reason than parentheses nest. That is:
var=`cmd1 `cmd2` -- `cmd3``
might be an attempt to run cmd1 with the output from cmd2 and cmd3 as arguments. This does not work, but:
var=$(cmd1 $(cmd2) -- $(cmd3))
does work. So we should do that.
We can get or check the status of the $(...) command:
var=$(cmd) || {
echo "unable to run the given command"
exit 1
}
for instance. So with git rev-parse, if the upstream doesn't exist, we might want:
result=$(git rev-parse refs/remotes/upstream/$MY_BRANCH) || exit
which will quit the script if git rev-parse fails. The rev-parse will still print an error message to stderr. To suppress that, we have:
--quiet --verify
as options to rev-parse, or we can redirect stderr to /dev/null; this part is your choice.
Finally, there are some corner cases to consider with git rev-parse --abbrev-rev HEAD:
You can be on a detached HEAD. In this case, as already noted, git rev-parse just prints HEAD. If that's what you'd like, go with that.
You can be in a completely empty repository, or on an unborn branch created with git checkout --orphan or git switch --orphan. In this case, git rev-parse produces an error message, and no output. You should check for this.
Optionally, you can use git symbolic-ref --short HEAD. This has different behavior: it produces the name of the branch if you are on a branch, including in the special case of an empty repository or orphan/unborn branch. But, when in detached-HEAD mode, it produces an error message.
So, in the end, the shortened script is:
MY_BRANCH=$(git rev-parse --abbrev-ref HEAD) || exit
hash=$(git rev-parse --quiet --verify refs/remotes/upstream/$MY_BRANCH) || {
echo "upstream/$MY_BRANCH does not exist"
exit 1
}
# do whatever you like with upstream/$MY_BRANCH here
If you want the hash ID corresponding to upstream/$MY_BRANCH, it is in $hash now.
Note that there's no point in exporting MY_BRANCH unless you want to access it from other programs that read an environment variable.
I figured that I needed to remove the single quotes and it works after that
git branch -a | egrep remotes/upstream/$MY_BRANCH
Following also works in a one liner:
git branch -a | egrep remotes/upstream/`git rev-parse --abbrev-ref HEAD`

How to check programmatically git commands output

Some git commands can give exit code 1 while it was actually successful, or for instance if I try to git commit -m <something> but there's nothing to commit, or trying git pull origin master while there're no changes and my local branch is up to date with the remote.
E.g: "Commit failed - exit code 1 received" when trying to commit a new local reprository in gihub desktop
How then to check that the git commands didn't actually fail and that it just says "nothing there to do"
The output when you commit, when there is nothing to commit, could be stored in a variable. When you have something to commit, you could store the output in a variable as well.
Afterwards you could use an if statement to test if one of the variables, which stored the output of, there was nothing to commit. Which would be unique and would give your custom output of, There was no code to commit.
Now the output of the variable which has code to commit, can be tested and you can have a custom output for this also.

git add - can I force exit code 0 when the only error is the presence of ignored files?

Main question
Pushing all local changes since the last commit and push is an operation very frequently used when working with git, so it may be beneficial to optimize its usage into one command that would have a script like this as its backbone:
git add * && git commit -m "my-message" && git push
But for some reason this failed in the repo I tried to use this in. After some searching, I found out using echo $? that when ignored files are present git add * returns 1. This is usually not a problem as it doesn't effect typing out the next command, but in a script it is critical to be able to automatically decide whether to continue. The obvious workaround to this problem is simply using
git add * || git commit -m "my-message" && git push
but I want to avoid this as it continues execution even if actual errors happen in git add.
Is there a configuration option (or any other solution) that makes git report an exit code of 0 when ignored files are included in the add command but no other error happens?
PS
I know that automating this is not a very wise use of time, that is not the purpose.
This is just the most simplified version of the code that fails, the actual script doesn't use hard coded commit messages.
This is caused by * being expanded by Bash to all the non-hidden files in the current directory (by default) and passed as separate arguments to git add. git complains because some of these explicitly named (as far as git can tell) files are ignored, so the command is ambiguous.
git add . will do basically the same thing, with the only exception of adding dotfiles, and returns exit code 0 even if the current directory includes ignored files.

Merge and commit in a single step with Mercurial

On Mercurial if I want to merge a branch I always have to perform it in two steps:
hg merge my_branch
hg ci -m "I just merged my branch"
Is there a way in which I could do the same in a single command (having it aborted if conflicts are found)?
hg merge my_branch -ci "I merged my branch"
Would
hg merge my_branch && hg ci -m "I just merged my branch"
be acceptable or do you need to call hg only once?
&& executes the following command only if the first one returns successfully.
Edit:
If you need the command as an alias with arguments, this gets a bit more complicated, as aliases do not accept arguments. You can use a function and alias it though:
branch_and_commit() { hg merge $1 && hg ci -m $2; } ; alias bac=branch_and_commit
You can then call it like this:
bac my_branch_id "this is the comment"
I would not advise to execute the two at the same operation. The main reason is, during a merge conflict, you will need to resolve it before committing. So when the first operation fails, you are left out with a state that requires another commit anyway.
If you need to do this, to simplify the ideal case workflow, at least create a script to handle the conflictual case and revert back to the starting revision, so that the user can merge and commit normally. But using a simple alias will cause you more trouble than help you, in the end.

How can I detect whether a git commit is a parent of other commits?

I'm writing a script that makes some trivial changes and then commits them to git. Because these are trivial changes, I want to do git commit --amend whenever I can get away with it -- specifically, when an amend won't "mess up" any other branches' history. If an amend would mess up another branch, I want to do a standard git commit instead.
For example, if my branches looked like this (a la "Visualize all branch history" in Git GUI):
* [experimental branch] Added feature.
* [master branch] Trivial change from script
* ...
and I'm running this script on the master branch, then I don't want to do an amend, because I would be replacing part of the experimental branch's history. Technically, this won't actually break anything -- the original commit will still be part of experimental's history, and will still be referenced so it won't get garbage collected -- but having nearly-but-not-quite-identical commits in two different branches makes life difficult when I later want to rebase or merge, so it's a situation I want to avoid.
How can I make my script automatically detect whether a commit has anything branched from it?
If simplifying assumptions help, I always run this script on the head of master, and I only use git as a local repository -- I don't push or pull changes anywhere.
This script is in Ruby, so I can either shell out to the git command line, or I can use Ruby bindings for git -- whichever would make this task easier.
Just run git branch --contains HEAD to get a list of branches that "contain" this commit. If the list is empty, that commit should be safe for ammending. You also might want to include the -a flag to list local AND remote branches.
Alternatively, you could compare the output of git rev-parse HEAD with git merge-base HEAD other-branch. If these commit IDs are identical, the current commit is in other-branch's commit history.
The commit graph is one-way: given a commit, you know all of its ancestors, but not any of its children. You'll have to start at the endpoints and backtrack until you get to the commit(s) that you want.
Use git rev-list [some commit] --children (HEAD is the default):
$ git rev-list HEAD --children
6edbee61c87fb063700751815f0ad53907d0b7a4
aee452860ecd772b8bdcd27227e6a72e6f4435fd 6edbee61c87fb063700751815f0ad53907d0b7a4
ef8a1487b03256a489d135e76d1f0b01872f2349 aee452860ecd772b8bdcd27227e6a72e6f4435fd
6910dc5833f6cd26133e32bef40ed54cf9337017 ef8a1487b03256a489d135e76d1f0b01872f2349
bbef0da56efe048f70293bd20bad0cb37b5e84f0 6910dc5833f6cd26133e32bef40ed54cf9337017
[...]
The left-hand column is a list of commit SHA-1s in reverse chronological order. Anything to that commit's right are the children (--children) of that commit. The top commit is HEAD and thus has no children.
Thus, if you grep for your SHA-1 in this list and it has anything to its right, it has at least one child:
$ git rev-list --children | grep '^6910'
6910dc5833f6cd26133e32bef40ed54cf9337017 ef8a1487b03256a489d135e76d1f0b01872f2349
In the above example, commit 6910... has a child, ef8a....

Resources