Git - Programatically determine if local commits have not been pushed - bash

I've seen lots of posts/answers on how to display all of the local commits for all local branches that have not been commited.
I have a narrow use case, and have not found an answer.
I need to determine from within a bash script, if my CURRENT branch has commits that have not been pushed upstream to the same branch. A count would be fine, but I really just need to know if a push has not yet been done. I don't care about any branches except the current branch, and in this case I've already checked to see if the branch is local (i.e. has not yet set upstream origin).
Mainly, I don't want the commits printed out. I just want to know that the number of unpushed commits is > 0.

The upstream of the current branch is #{u}. # is a synonym for HEAD.
#..#{u} selects the commits which are upstream, but not local. If there are any you are behind. We can count them with git rev-list.
# Move 4 commits behind upstream.
$ git reset --hard #{u}^^^
$ git rev-list #..#{u} --count
4
#{u}..# does the opposite. It selects commits which are local but not upstream.
# Move to upstream
$ git reset --hard #{u}
# Add a commit
$ git commit --allow-empty -m 'test'
[main 6dcf66bda1] test
$ git rev-list #{u}..# --count
1
And if both are 0 you are up to date.
Note: this will only be "up to date" as of your last fetch. Whether you want to combine this with a fetch is up to you, but one should get used to the fact that Git does not regularly talk to the network.
(I've borrowed from ElpieKay's answer).
See gitrevisions for more about #, .. and many ways to select revisions.

Suppose the branch is foo.
git fetch origin foo
git rev-list FETCH_HEAD..foo --count
The 1st command gets the latest head of the foo in the remote repository and stores its commit sha1 in FETCH_HEAD.
git rev-list returns a number. If it's 0, all commits of the foo in the local repository have been included in the foo branch in the remote repository. If it's larger than 0, there are this number of commits not included in the remote foo yet.
Being included is different from being pushed if the procedure involves a pending change like a pull request or a merge request. The commands can determine if the local branch's commits have been merged to(included in) its remote counterpart.

You're looking for git status. It will give you output like:
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
And you can grep for this with:
git status | grep -E "Your branch is ahead of '.*' by ([0-9]*) commit."

Related

Check whether a repository would be pushed in a shell script [duplicate]

How can I view any local commits I've made, that haven't yet been pushed to the remote repository? Occasionally, git status will print out that my branch is X commits ahead of origin/master, but not always.
Is this a bug with my install of Git, or am I missing something?
This gives a log of all commits between origin/master and HEAD:
git log origin/master..HEAD
When HEAD is on the master branch, this gives a log of unpushed commits.
Similarly, to view the diff:
git diff origin/master..HEAD
To see all commits on all branches that have not yet been pushed:
git log --branches --not --remotes
To see the most recent commit on each branch, as well as the branch names:
git log --branches --not --remotes --simplify-by-decoration --decorate --oneline
Show all commits that you have locally but not upstream with:
git log #{u}..
#{u} or #{upstream} means the upstream branch of the current branch (see git rev-parse --help or git help revisions for details).
git cherry -v
Taken from: Git: See all unpushed commits or commits that are not in another branch.
You can do this with git log:
git log origin/master..
This assumes that origin is the name of your upstream remote and master is the name of your upstream branch. Leaving off any revision name after .. implies HEAD, which lists the new commits that haven't been pushed.
All the other answers talk about "upstream" (the branch you pull from).
But a local branch can push to a different branch than the one it pulls from.
A master might not push to the remote-tracking branch "origin/master".
The upstream branch for master might be origin/master, but it could push to the remote tracking branch origin/xxx or even anotherUpstreamRepo/yyy.
Those are set by branch.*.pushremote for the current branch along with the global remote.pushDefault value.
It is that remote-tracking branch that counts when seeking unpushed commits: the one that tracks the branch at the remote where the local branch would be pushed to.
The branch at the remote can be, again, origin/xxx or even anotherUpstreamRepo/yyy.
Git 2.5+ (Q2 2015) introduces a new shortcut for that: <branch>#{push}
See commit 29bc885, commit 3dbe9db, commit adfe5d0, commit 48c5847, commit a1ad0eb, commit e291c75, commit 979cb24, commit 1ca41a1, commit 3a429d0, commit a9f9f8c, commit 8770e6f, commit da66b27, commit f052154, commit 9e3751d, commit ee2499f [all from 21 May 2015], and commit e41bf35 [01 May 2015] by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit c4a8354, 05 Jun 2015)
Commit adfe5d0 explains:
sha1_name: implement #{push} shorthand
In a triangular workflow, each branch may have two distinct points of interest: the #{upstream} that you normally pull from, and the destination that you normally push to. There isn't a shorthand for the latter, but it's useful to have.
For instance, you may want to know which commits you haven't
pushed yet:
git log #{push}..
Or as a more complicated example, imagine that you normally pull changes from origin/master (which you set as your #{upstream}), and push changes to your fork (e.g., as myfork/topic).
You may push to your fork from multiple machines, requiring you to integrate the changes from the push destination, rather than upstream.
With this patch, you can just do:
git rebase #{push}
rather than typing out the full name.
Commit 29bc885 adds:
for-each-ref: accept "%(push)" format
Just as we have "%(upstream)" to report the "#{upstream}" for each ref, this patch adds "%(push)" to match "#{push}".
It supports the same tracking format modifiers as upstream (because you may want to know, for example, which branches have commits to push).
If you want to see how many commit your local branches are ahead/behind compared to the branch you are pushing to:
git for-each-ref --format="%(refname:short) %(push:track)" refs/heads
I had a commit done previously, not pushed to any branch, nor remote nor local. Just the commit. Nothing from other answers worked for me, but with:
git reflog
There I found my commit.
Handy git alias for looking for unpushed commits in current branch:
alias unpushed = !GIT_CURRENT_BRANCH=$(git name-rev --name-only HEAD) && git log origin/$GIT_CURRENT_BRANCH..$GIT_CURRENT_BRANCH --oneline
What this basically does:
git log origin/branch..branch
but also determines current branch name.
You could try....
gitk
I know it is not a pure command line option but if you have it installed and are on a GUI system it's a great way to see exactly what you are looking for plus a whole lot more.
(I'm actually kind of surprised no one mentioned it so far.)
git branch -v will show, for each local branch, whether it's "ahead" or not.
I use the following alias to get just the list of files (and the status) that have been committed but haven't been pushed (for the current branch)
git config --global alias.unpushed \
"diff origin/$(git name-rev --name-only HEAD)..HEAD --name-status"
then just do:
git unpushed
I believe the most typical way of doing this is to run something like:
git cherry --abbrev=7 -v #{upstream}
However, I personally prefer running:
git log --graph --decorate --pretty=oneline --abbrev-commit --all #{upstream}^..
which shows the commits from all branches which are not merged upstream, plus the last commit in upstream (which shows up as a root node for all the other commits). I use it so often that I have created alias noup for it.
git config --global alias.noup \
'log --graph --decorate --pretty=oneline --abbrev-commit --all #{upstream}^..'
git cherry -v
This will list out your local comment history (not yet pushed) with corresponding message
I suggest you go see the script https://github.com/badele/gitcheck, i have coded this script for check in one pass all your git repositories, and it show who has not commited and who has not pushed/pulled.
Here a sample result
It is not a bug. What you probably seeing is git status after a failed auto-merge where the changes from the remote are fetched but not yet merged.
To see the commits between local repo and remote do this:
git fetch
This is 100% safe and will not mock up your working copy. If there were changes git status wil show X commits ahead of origin/master.
You can now show log of commits that are in the remote but not in the local:
git log HEAD..origin
This worked better for me:
git log --oneline #{upstream}..
or:
git log --oneline origin/(remotebranch)..
There is tool named unpushed that scans all Git, Mercurial and Subversion repos in specified working directory and shows list of ucommited files and unpushed commits.
Installation is simple under Linux:
$ easy_install --user unpushed
or
$ sudo easy_install unpushed
to install system-wide.
Usage is simple too:
$ unpushed ~/workspace
* /home/nailgun/workspace/unpushed uncommitted (Git)
* /home/nailgun/workspace/unpushed:master unpushed (Git)
* /home/nailgun/workspace/python:new-syntax unpushed (Git)
See unpushed --help or official description for more information. It also has a cronjob script unpushed-notify for on-screen notification of uncommited and unpushed changes.
To list all unpushed commit in all branches easily you can use this command:
git log --branches #{u}..
If the number of commits that have not been pushed out is a single-digit number, which it often is, the easiest way is:
$ git checkout
git responds by telling you that you are "ahead N commits" relative your origin. So now just keep that number in mind when viewing logs. If you're "ahead by 3 commits", the top 3 commits in the history are still private.
Similar: To view unmerged branches:
git branch --all --no-merged
Those can be suspect but I recommend the answer by cxreg
one way of doing things is to list commits that are available on one branch but not another.
git log ^origin/master master
I'm really late to the party, and I'm not sure when it was implemented, but to see what a git push would do, just use the --dry-run option:
$ git push --dry-run
To ssh://bitbucket.local.lan:7999/qarepo/controller.git
540152d1..21bd921c imaging -> imaging
As said above:
git diff origin/master..HEAD
But if you are using git gui
After opening gui interface, Select "Repository"->Under that "Visualize History"
Note: Some people like to use CMD Prompt/Terminal while some like to use Git GUI (for simplicity)
If you have git submodules...
Whether you do git cherry -v or git logs #{u}.. -p, don't forget to include your submodules via
git submodule foreach --recursive 'git logs #{u}..'.
I am using the following bash script to check all of that:
unpushedCommitsCmd="git log #{u}.."; # Source: https://stackoverflow.com/a/8182309
# check if there are unpushed changes
if [ -n "$($getGitUnpushedCommits)" ]; then # Check Source: https://stackoverflow.com/a/12137501
echo "You have unpushed changes. Push them first!"
$getGitUnpushedCommits;
exit 2
fi
unpushedInSubmodules="git submodule foreach --recursive --quiet ${unpushedCommitsCmd}"; # Source: https://stackoverflow.com/a/24548122
# check if there are unpushed changes in submodules
if [ -n "$($unpushedInSubmodules)" ]; then
echo "You have unpushed changes in submodules. Push them first!"
git submodule foreach --recursive ${unpushedCommitsCmd} # not "--quiet" this time, to display details
exit 2
fi
Here's my portable solution (shell script which works on Windows too without additional install) which shows the differences from origin for all branches: git-fetch-log
An example output:
==== branch [behind 1]
> commit 652b883 (origin/branch)
| Author: BimbaLaszlo <bimbalaszlo#gmail.com>
| Date: 2016-03-10 09:11:11 +0100
|
| Commit on remote
|
o commit 2304667 (branch)
Author: BimbaLaszlo <bimbalaszlo#gmail.com>
Date: 2015-08-28 13:21:13 +0200
Commit on local
==== master [ahead 1]
< commit 280ccf8 (master)
| Author: BimbaLaszlo <bimbalaszlo#gmail.com>
| Date: 2016-03-25 21:42:55 +0100
|
| Commit on local
|
o commit 2369465 (origin/master, origin/HEAD)
Author: BimbaLaszlo <bimbalaszlo#gmail.com>
Date: 2016-03-10 09:02:52 +0100
Commit on remote
==== test [ahead 1, behind 1]
< commit 83a3161 (test)
| Author: BimbaLaszlo <bimbalaszlo#gmail.com>
| Date: 2016-03-25 22:50:00 +0100
|
| Diverged from remote
|
| > commit 4aafec7 (origin/test)
|/ Author: BimbaLaszlo <bimbalaszlo#gmail.com>
| Date: 2016-03-14 10:34:28 +0100
|
| Pushed remote
|
o commit 0fccef3
Author: BimbaLaszlo <bimbalaszlo#gmail.com>
Date: 2015-09-03 10:33:39 +0200
Last common commit
Parameters passed for log, e.g. --oneline or --patch can be used.
git show
will show all the diffs in your local commits.
git show --name-only
will show the local commit id and the name of commit.
git diff origin
Assuming your branch is set up to track the origin, then that should show you the differences.
git log origin
Will give you a summary of the commits.

What bash command can be used to tell if a local git repository is out of sync with its remote

Searching in StackOverflow and on Google, I can find lots of items about how to sync, and do everything with git, except I can't find a reliable way using a bash script to tell if my local repository is in sync with its remote. I've tried:
1.
git branch -v |
perl -wlne'
print "$ENV{reponame} [$1] --> $3 $2"
if /^..(\S+)\s+([a-f0-9]+)\s+(\[(?:ahead|behind)\s+\d+\])/
' |
while IFS= read -r MOD; do
printf ' %s\n' "$MOD" # Replace with code that uses $MOD
done
and,
2.
isInSync=`git remote update>/dev/null; git status -uno | grep -o "is up to date"`
if [ $? -ne 0 ]; then
echo " $reponame [$br] --> Out of sync with remote at $d"
ok=false
fi
and have been trying to find a way to use:
git log --oneline | head -n1 | sed # this I don't know.
to try to get the first word in the log file, which is the commit hash of the last commit, and compare it with:
rev=$(git rev-parse --short HEAD)
which is the commit hash of the local repository branch you are in.
The problem with #1 is that it doesn't seem to pick up when the local is out of sync with the remote.
The problem with # 2, is that it causes the local .git/config to get involved and produces odd attempts to access different remote repositories like heroic.
The problem with #3 is that I can't figure out how to get the hash code from the git log and then compare it to the $rev above. It would seem to be the best bet as when I check it on different computers in different states, it seems to convey the right information.
I am writing a bash script that checks a group of git projects and tells me their states, i.e. up to date, untracked files, uncommitted files, and out of sync with the remote.
Help would be appreciated in either suggesting a better way or how to do the extraction of the commit-hash from the log and compare it to the current commit-hash of the local last commit.
As you are seeing, you have to define what you mean by in sync.
A Git repository first and foremost is a way to hold commits. The set of commits in the repository, as found by branch and tag names and other such names, is what is in the repository. The commits are the history.
Most Git repositories that users work with, however, are not "bare". They have a work-tree or working tree into which files can be extracted from any commit, and/or additional files created and/or various files modified and so on. This working area can be "clean" (matches a commit, more or less) or "dirty".
Is a "dirty" repository, with lots of work going on inside it, "in sync" with some other repository, even if both repositories have exactly the same set of commits? Or are these "not in sync"? What if the other repository has a work-tree and it's "dirty" in exactly the same way? That's something you need to define.
(Besides the work-tree, all repositories—even bare ones—have an index as well, which can also be "clean" or "dirty", and perhaps you should factor that in as well.)
A repository can have one or more remotes defined as well. Given a remote name, such as origin, Git can be told: connect to some network URL and obtain new commits from a Git over there, and put them into this repository. That's what your git remote update is doing. How many remotes that contacts—it could get some of them, or all of them, or maybe some are unreachable at the moment—is difficult to answer, as this is all quite configurable.
... [get] the commit hash of the last commit
Each branch name automatically holds the hash ID of the last commit in that branch. There can be more than one "last commit", in other words. Using HEAD is the right way to find the hash ID of the current commit, but this may not be the tip commit of any branch:
rev=$(git rev-parse HEAD)
If HEAD contains the name of an unborn branch, this step will fail. That state is the case in any totally-empty repository (because there are no commits, hence there can be no branch names; branch names are required to name some existing commit). It's also the state after a git checkout --orphan operation, however.
I am writing a bash script that checks a group of git projects and tells me their states, i.e. up to date, untracked files, uncommitted files, and out of sync with the remote.
So, you get to choose how to define each of these things.
In general, I would:
Optionally, have each repository contact its main upstream(s), whatever those may be: probably those defined by whatever git remote update does; consider here whether it's good or bad to allow --prune (see also the fetch.prune setting).
Check at least the current branch, as reported by git symbolic-ref HEAD: if this command fails, there is no current branch, i.e., we're on a detached HEAD.
Check the status as reported by git status --porcelain=v2 or similar, to look at the state of the index and work-tree. Consider checking submodule status here as well; git status may do this for you, or not, depending on settings.
Use the current branch's upstream setting, if (a) there is a current branch and (b) it has an upstream. This upstream setting is often the name of a branch in another Git repository. (It can instead be the name of a branch in this repository.) If so, use git rev-list --left-right --count branch...branch#{u} to count the number of commits ahead and/or behind, perhaps after the git remote update with or without --prune.
Optionally, check each branch name, keeping in mind that each Git repository has its own branch names and there's no particular reason, other than convenience and convention, to use the same names in two different Git repositories. That is, my dev might not be related to origin/dev, for instance. If my dev goes with origin/develop I probably set the upstream of dev to origin/develop, so consider checking each branch's upstream. Note that git branch -vv does this (and also counts ahead/behind values, unless told not to, and also has extra support for added work-trees now).
Except for current branch and dirtiness (which git status already reports), most of the work is just git remote update -p and git branch -vv, really.
In the above code for git_check, I replace the section:
2.
isInSync=`git remote update>/dev/null; git status -uno | grep -o "is up to date"`
if [ $? -ne 0 ]; then
echo " $reponame [$br] --> Out of sync with remote at $d"
ok=false
fi
with:
last_commit=`git log --oneline | head -n1 | grep -o "^\w*\b"`
rev=$(git rev-parse --short HEAD)
if [ "$last_commit" != "$rev" ]; then
echo " $reponame [$br] --> Out of sync with remote at $d"
ok=false
fi
Here is the Console output with some comments
Modify a file that has been committed.
[:~/bin] develop(+4/-2) 2s ± git status
On branch develop
Your branch is up to date with 'origin/develop'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: git_check
no changes added to commit (use "git add" and/or "git commit -a")
[:~/bin] develop(+4/-2) ± git_check
bin [develop] --> Changes to be staged/committed at /Users/userid/bin
[:~/bin] develop(+4/-2) 128 ± git add -A
[:~/bin] develop(+0/-0) ± git diff --cached --shortstat
1 file changed, 4 insertions(+), 2 deletions(-)
[:~/bin] develop(+0/-0) ± git_check
bin [develop] --> Changes to be committed at /Users/userid/bin
[:~/bin] develop(+0/-0) 2s ± git commit -m "Better way to check for remote sync"[develop fab4f1d] Better way to check for remote sync
1 file changed, 4 insertions(+), 2 deletions(-)
[:~/bin] develop(1) ± git_check
OK --> bin [develop] fab4f1d
bin [develop] --> [ahead 1] fab4f1d
[:~/bin] develop(1) 2s ± git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 4 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 416 bytes | 416.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/userid/repo.git
ee0f0ca..fab4f1d develop -> develop
[:~/bin] develop 4s ± git_check
OK --> bin [develop] fab4f1d
For me, at least, this appears to solve my problem. As the console output shows, it gives me information about what is happening at each stage of the workflow. Especially, it tells me that my local repository is ahead of the remote repository by 1, telling me I need to do a push. On the other machines, I will see that they are behind by 1, telling me I need to do a pull to sync up.
Thanks to torek for the helpful information.

Error when creating all branches at once

I have created this bash script to create all branches at once
#!/bin/bash
git fetch -vp
for b in $(git branch -a | grep remotes | grep -v HEAD)
do
branchname=${b##*/}
remote=${b#*/}
command="git branch --track $branchname $remote"
echo "$command"
$($command)
done
but I am always having the same error:
fatal: 'master' is not a valid branch name.
If I run the same command without the script the it is executed successfully.
What am I doing wrong ?
I have created this script because I need push all my branches to another remote repo, so first I need create the all local branches from original repository...
You don't.
You can use the remote tracking branches which you already have from git fetch. That's like origin/master. git fetch has already downloaded all the commits from the remote, and branches in Git are just labels on commits. Even remote branches.
You can get a list of all your remote tracking branches from git branch -r but that's from all remotes. To get the branches for just one remote use git ls-remotes --heads <remote>. The format is a little funny, so you have to do some massaging.
$ git ls-remote --heads $REMOTE | cut -f2 | sed -E "s/refs\/heads/$REMOTE/"
origin/80_method_methods
origin/gh-pages
origin/io-all
origin/issue/217
origin/issue/255
origin/master
origin/rewrite_cmd_wrapper
Then you can push from those branches.
Though wanting to push all the branches from one repo to another is very odd. There's probably a simpler way to solve whatever problem you're trying to solve. I suspect you have an XY Problem.
I would suggest asking a question about that problem instead.
As it is well explained in #Schwern's answer, wanting to "checkout all branches" of a remote repo is probably a XY problem, and it is unneeded in practice because git fetch $REMOTE is enough to retrieve the whole history of the considered repo.
However the proposed command git ls-remote ... does not seem to be appropriate because:
it needs a network connection
it doesn't display the list of remote branches
it just queries the remote repo (without relying on fetched data) to know if the commits corresponding to local branches happen to be known in the remote.
In order to efficiently get the list of all branches from a given remote, I propose instead:
git for-each-ref --format "%(refname:strip=2)" "refs/remotes/$REMOTE/" | grep -v -e '/HEAD$'

How to extract columns about current commit in git local and remote repos

A local git repository has checked out code from a remote git repository. I am able to identify the current branch of the local git repository by typing in the following command on the local server:
git branch -vv
* Issue_Example c167ce9 [origin/Issue_Example] who is here right now?
master cf60eb7 [origin/master] Initial Commit
As you can see from the results above, the current branch is indicated using an * symbol. Also, each line in the results contains the following columns (I have placed the values for the current branch alongside each column below to make it blindingly clear):
local branch = "Issue_Example"
commit hash = c167ce9
remote branch linked to local branch: "origin/Issue_Example"
description of commit: "who is here right now? "
How can I filter the results of the git branch -vv command to return only each column individually? For example:
First desired command:
Get local branch name of current branch:
git branch -vv --current-branch-only --local-name-only
Would print out "Issue_Example"
Second desired command:
Get commit hash for the current branch:
git branch -vv --current-branch-only --commit-hash-only
Would print out "c167ce9"
Third desired command:
Get name of remote branch linked to current branch:
git branch -vv --current-branch-only --name-of-remote-branch-linked-to-local-branch-only
Would print out "origin/Issue_Example"
Fourth desired command:
Get description of commit:
git branch -vv --current-branch-only --description-of-commit-only
Would print out "who is here right now?"
What actual syntax would be required to retrieve the information specified above? This is on a CentOS server, so we are using bash scripting if scripting is necessary.
Name of current branch
git rev-parse --abbrev-ref HEAD
Hash of currently checked out commit (branch)
git rev-parse HEAD # full hash
git rev-parse --short HEAD # short hash
Remote tracking branch (upstream) of current branch
git rev-parse --abbrev-ref #{upstream}
Commit message subject of currently checked out commit
git log -1 --format="%s"
More info at man git-rev-parse, man git-log.

Checkout remote git branch if exists else clone master make new branch and push

Sorry for the verbose title.
In fastlane, using the programming language Ruby, I want to have this functionality:
if remote_branch_exist
git_clone_remote_branch
else
git_clone_master
git_branch
git_push_branch_to_master
end
I have searched for a one liner git command that does that, but not succeeded. Is it possible?
I have written this code that does exactly what I want. But it must surely be an unnecessary amount of code.
def git_clone_sdk_repo(path_repo: nil)
some_branch = "some_branch"
git_url = "git#github.com:MyComp/MyRepo.git"
if check_if_remote_branch_exists(git_url: git_url, branch_name: some_branch)
puts "remote branch exists"
sh "git clone -b #{some_branch} #{git_url} #{path_repo}"
else
puts "no remote branch"
sh "git clone #{git_url} #{path_repo}"
pwd = Dir.pwd
FileUtils.cd(path_repo)
sh "git checkout -b #{some_branch}"
sh "git push --set-upstream origin #{some_branch}"
FileUtils.cd(pwd)
end
end
def check_if_remote_branch_exists(git_url: nil, branch_name: nil)
check_if_remote_branch_exists = "git ls-remote --heads #{git_url} #{branch_name} | wc -l | grep -o -q '1'"
system(check_if_remote_branch_exists)
end
(The method sh in the code block above is used to call CLI commands. I think it is part of fastlane.)
Running this command:
git clone -b <some_branch> <git_url> <path_repo>
Results in:
fatal: Remote branch <some_branch> not found in upstream origin
If there is no branch in the remote with that name. So that is why I am first checking if there is a remote branch with such a name.
What neat git command am I missing?
Let me re-express this task in Git terms, rather than as Ruby code.
You wish to:
Clone a repository from some URL. We will then save that URL under the usual "remote" name, origin.
Given a branch name such as foo, check out that particular branch (so that the current commit is the tip commit of that branch).
If the branch can be derived from a remote-tracking branch, as (e.g.) is usually true for master which usually derives from origin/master—you want Git to create this branch locally, with the corresponding remote-tracking branch set as its upstream, ready to do work on it. Hence if branch foo exists in the Git repository on origin so that origin/foo will exist in the local repository, you want to create local branch foo with origin/foo as its upstream.
If not, however—if there is no corresponding upstream name, so that at the moment, the branch is going to be a new branch—you want to create that new branch such that it points to the same commit that origin/master will point-to. In this case, you then also want to immediately (or as quickly as possible) request that the Git on origin also create this branch-name, pointing to that very same commit, and on success, set foo to have origin/foo as its upstream. Ideally, the end result of this process is that local branch foo exists and has origin/foo as its upstream.
You have observed that if foo exists on the remote, git clone -b foo <url> <directory> does the trick in one clean step (although as a side effect, the local clone will not have a master branch yet!). If foo does not exist on the remote, though, the clone fails.
Unfortunately, there is no single Git command that can do all this. Moreover, there is an atomicity issue here ("atomicity" having its usual meaning in database or parallel programming terms): the fact that foo does not exist during the cloning step does not mean that foo will not exist by the time you ask the upstream repository to create it.
The "best" answer to all of this depends on how much you care about this atomicity problem (solving it generally just moves atomicity issues to a later push step, since branch foo could be removed on the server by then, or have acquired extra commits, or been rewound and rewritten, or whatever). But in the end you must use multiple Git commands.
Method 1
The sequence that uses the least network traffic is to clone without -b. In this case, your clone will check out some branch all on its own—usually master, but the actual branch chosen will depend on what is in the HEAD entry for the Git at the URL that will be stored in the remote. Your clone will then have the remote's URL saved as usual, under the name origin (or any -o argument you supply).
Now you can simply attempt to git checkout foo. Either foo is already the current branch (because it was in HEAD on the remote), so that this is a successful no-op; or foo isn't the current branch. If foo is not the current branch, thish will create foo as a local branch with origin/foo set as its upstream if and only if origin/foo exists. This origin/foo will in turn exist if and only if a branch named foo existed on the remote at the time you did the clone (see "atomicity").
If the git checkout fails, you can assume that origin/foo does not exist. (The only other possibility is that things are going very badly wrong, e.g., you have run out of disk space or the storage device is failing, or there are bugs in Git: in both cases all bets are off.) You can at this point go down your "create foo pointing to the same commit as origin/master and use git push -u to ask to create it on origin too" path, and verify that this all works. As usual with git push, you are now racing against anyone else creating foo. Note also that there may not be an origin/master in your own repository, if there was no master on the other Git at the time you did the clone.
Method 2
You can use git ls-remote as you are doing now, which does one complete round-trip operation to the remote (currently via URL, since there is as yet no local clone, hence no remote named origin to store that URL) to determine the set of references it has. If foo does not exist in that repository, you can ask that Git to create it. You can do this a little bit differently, if you like, using a series of local Git operations in a new repository that, as yet, has nothing at all in it:
mkdir <directory>
cd <directory>
git init
git remote add origin <url>
At this point you can run git ls-remote origin, because now there is a remote named origin. However, there are no local branches at all. Now we run into the usual atomicity issues, and "what to do next" depends once again on how you wish to solve them. But if I were not using method 1 or some slight variant of it, this is what I would do next:
# assumes $branch is set to "foo" as needed, and that
# function "die" prints an error message and exits with failure
git fetch origin # bring over all commits and origin/* branches
if branchrev=$(git rev-parse -q --verify origin/$branch); then
# origin/$branch exists, so we want to act like "git clone -b $branch"
git checkout $branch ||
die "unable to check out $branch, cannot proceed"
else
# origin/$branch does not exist: ask to create it pointing to
# origin/master
rev=$(git rev-parse -q --verify origin/master) ||
die "no origin/master exists, cannot proceed"
git checkout -b $branch $rev ||
die "failed to create $branch"
git push -u origin "$branch:refs/heads/$branch" ||
die "failed to create $branch on origin"
fi
The git checkout -b creates the branch in the local repository, and sets it as the current branch. Since the initial commit ID is given by raw commit hash (due to $rev containing the result from git rev-parse), it will have no upstream. You could instead use git checkout -b $branch origin/master but this will set the upstream for the new branch to origin/master, leaving a trap for the unwary if the git push -u fails for some reason (e.g., network failure). You could use git checkout --no-track -b $branch origin/master, but given the test to make sure origin/master is a valid name, we might as well save the hash ID in $rev and use that.
This same bit of shell script—which you could rewrite in Ruby if you like—can be used after a regular old git clone, instead of using the somewhat obscure git init; git remote add ...; git fetch sequence that does everything git clone would do except for the initial git checkout of whichever branch the remote's HEAD indicates.
(In other words, in practice, I'd just run git clone—without the tricky -b part—first, then do everything in the shell script section above except the git fetch step, which is generally unnecessary right after the clone step. If the clone will take a very long time, the extra git fetch might still be useful, since that will then shrink the atomicity race, at the cost of one more round-trip to the server at origin. Nothing can completely close the race, though.)

Resources