Related
I accidentally dropped a DVD-rip into a website project, then carelessly git commit -a -m ..., and, zap, the repo was bloated by 2.2 gigs. Next time I made some edits, deleted the video file, and committed everything, but the compressed file is still there in the repository, in history.
I know I can start branches from those commits and rebase one branch onto another. But what should I do to merge the 2 commits so that the big file doesn't show in the history and is cleaned in the garbage collection procedure?
Use the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for removing unwanted files from Git history.
Carefully follow the usage instructions, the core part is just this:
$ java -jar bfg.jar --strip-blobs-bigger-than 100M my-repo.git
Any files over 100MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data:
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
After pruning, we can force push to the remote repo*
$ git push --force
*NOTE: cannot force push a protect branch on GitHub
The BFG is typically at least 10-50x faster than running git-filter-branch, and generally easier to use.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
What you want to do is highly disruptive if you have published history to other developers. See “Recovering From Upstream Rebase” in the git rebase documentation for the necessary steps after repairing your history.
You have at least two options: git filter-branch and an interactive rebase, both explained below.
Using git filter-branch
I had a similar problem with bulky binary test data from a Subversion import and wrote about removing data from a git repository.
Say your git history is:
$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A login.html
* cb14efd Remove DVD-rip
| D oops.iso
* ce36c98 Careless
| A oops.iso
| A other.html
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
Note that git lola is a non-standard but highly useful alias. (See the addendum at the end of this answer for details.) The --name-status switch to git log shows tree modifications associated with each commit.
In the “Careless” commit (whose SHA1 object name is ce36c98) the file oops.iso is the DVD-rip added by accident and removed in the next commit, cb14efd. Using the technique described in the aforementioned blog post, the command to execute is:
git filter-branch --prune-empty -d /dev/shm/scratch \
--index-filter "git rm --cached -f --ignore-unmatch oops.iso" \
--tag-name-filter cat -- --all
Options:
--prune-empty removes commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history.
-d names a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in /dev/shm will result in faster execution.
--index-filter is the main event and runs against the index at each step in the history. You want to remove oops.iso wherever it is found, but it isn’t present in all commits. The command git rm --cached -f --ignore-unmatch oops.iso deletes the DVD-rip when it is present and does not fail otherwise.
--tag-name-filter describes how to rewrite tag names. A filter of cat is the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality.
-- specifies the end of options to git filter-branch
--all following -- is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.
After some churning, the history is now:
$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A login.html
* e45ac59 Careless
| A other.html
|
| * f772d66 (refs/original/refs/heads/master) Login page
| | A login.html
| * cb14efd Remove DVD-rip
| | D oops.iso
| * ce36c98 Careless
|/ A oops.iso
| A other.html
|
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
Notice that the new “Careless” commit adds only other.html and that the “Remove DVD-rip” commit is no longer on the master branch. The branch labeled refs/original/refs/heads/master contains your original commits in case you made a mistake. To remove it, follow the steps in “Checklist for Shrinking a Repository.”
$ git update-ref -d refs/original/refs/heads/master
$ git reflog expire --expire=now --all
$ git gc --prune=now
For a simpler alternative, clone the repository to discard the unwanted bits.
$ cd ~/src
$ mv repo repo.old
$ git clone file:///home/user/src/repo.old repo
Using a file:///... clone URL copies objects rather than creating hardlinks only.
Now your history is:
$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A login.html
* e45ac59 Careless
| A other.html
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
The SHA1 object names for the first two commits (“Index” and “Admin page”) stayed the same because the filter operation did not modify those commits. “Careless” lost oops.iso and “Login page” got a new parent, so their SHA1s did change.
Interactive rebase
With a history of:
$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A login.html
* cb14efd Remove DVD-rip
| D oops.iso
* ce36c98 Careless
| A oops.iso
| A other.html
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
you want to remove oops.iso from “Careless” as though you never added it, and then “Remove DVD-rip” is useless to you. Thus, our plan going into an interactive rebase is to keep “Admin page,” edit “Careless,” and discard “Remove DVD-rip.”
Running $ git rebase -i 5af4522 starts an editor with the following contents.
pick ce36c98 Careless
pick cb14efd Remove DVD-rip
pick f772d66 Login page
# Rebase 5af4522..f772d66 onto 5af4522
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#
Executing our plan, we modify it to
edit ce36c98 Careless
pick f772d66 Login page
# Rebase 5af4522..f772d66 onto 5af4522
# ...
That is, we delete the line with “Remove DVD-rip” and change the operation on “Careless” to be edit rather than pick.
Save-quitting the editor drops us at a command prompt with the following message.
Stopped at ce36c98... Careless
You can amend the commit now, with
git commit --amend
Once you are satisfied with your changes, run
git rebase --continue
As the message tells us, we are on the “Careless” commit we want to edit, so we run two commands.
$ git rm --cached oops.iso
$ git commit --amend -C HEAD
$ git rebase --continue
The first removes the offending file from the index. The second modifies or amends “Careless” to be the updated index and -C HEAD instructs git to reuse the old commit message. Finally, git rebase --continue goes ahead with the rest of the rebase operation.
This gives a history of:
$ git lola --name-status
* 93174be (HEAD, master) Login page
| A login.html
* a570198 Careless
| A other.html
* 5af4522 Admin page
| A admin.html
* e738b63 Index
A index.html
which is what you want.
Addendum: Enable git lola via ~/.gitconfig
Quoting Conrad Parker:
The best tip I learned at Scott Chacon’s talk at linux.conf.au 2010, Git Wrangling - Advanced Tips and Tricks was this alias:
lol = log --graph --decorate --pretty=oneline --abbrev-commit
This provides a really nice graph of your tree, showing the branch structure of merges etc. Of course there are really nice GUI tools for showing such graphs, but the advantage of git lol is that it works on a console or over ssh, so it is useful for remote development, or native development on an embedded board …
So, just copy the following into ~/.gitconfig for your full color git lola action:
[alias]
lol = log --graph --decorate --pretty=oneline --abbrev-commit
lola = log --graph --decorate --pretty=oneline --abbrev-commit --all
[color]
branch = auto
diff = auto
interactive = auto
status = auto
Why not use this simple but powerful command?
git filter-branch --tree-filter 'rm -f DVD-rip' HEAD
The --tree-filter option runs the specified command after each checkout of the project and then recommits the results. In this case, you remove a file called DVD-rip from every snapshot, whether it exists or not.
If you know which commit introduced the huge file (say 35dsa2), you can replace HEAD with 35dsa2..HEAD to avoid rewriting too much history, thus avoiding diverging commits if you haven't pushed yet. This comment courtesy of #alpha_989 seems too important to leave out here.
See this link.
(The best answer I've seen to this problem is: https://stackoverflow.com/a/42544963/714112 , copied here since this thread appears high in Google search rankings but that other one doesn't)
🚀 A blazingly fast shell one-liner 🚀
This shell script displays all blob objects in the repository, sorted from smallest to largest.
For my sample repo, it ran about 100 times faster than the other ones found here.
On my trusty Athlon II X4 system, it handles the Linux Kernel repository with its 5,622,155 objects in just over a minute.
The Base Script
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| awk '/^blob/ {print substr($0,6)}' \
| sort --numeric-sort --key=2 \
| cut --complement --characters=13-40 \
| numfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
When you run above code, you will get nice human-readable output like this:
...
0d99bb931299 530KiB path/to/some-image.jpg
2ba44098e28f 12MiB path/to/hires-image.png
bd1741ddce0d 63MiB path/to/some-video-1080p.mp4
🚀 Fast File Removal 🚀
Suppose you then want to remove the files a and b from every commit reachable from HEAD, you can use this command:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch a b' HEAD
After trying virtually every answer in SO, I finally found this gem that quickly removed and deleted the large files in my repository and allowed me to sync again: http://www.zyxware.com/articles/4027/how-to-delete-files-permanently-from-your-local-and-remote-git-repositories
CD to your local working folder and run the following command:
git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch FOLDERNAME" -- --all
replace FOLDERNAME with the file or folder you wish to remove from the given git repository.
Once this is done run the following commands to clean up the local repository:
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now
Now push all the changes to the remote repository:
git push --all --force
This will clean up the remote repository.
100 times faster than git filter-branch and simpler
There are very good answers in this thread, but meanwhile many of them are outdated. Using git-filter-branch is no longer recommended, because it is difficult to use and awfully slow on big repositories.
git-filter-repo is much faster and simpler to use.
git-filter-repo is a Python script, available at github: https://github.com/newren/git-filter-repo . When installed it looks like a regular git command and can be called by git filter-repo.
You need only one file: the Python3 script git-filter-repo. Copy it to a path that is included in the PATH variable. On Windows you may have to change the first line of the script (refer INSTALL.md). You need Python3 installed installed on your system, but this is not a big deal.
First you can run
git filter-repo --analyze
This helps you to determine what to do next.
You can delete your DVD-rip file everywhere:
git filter-repo --invert-paths --path-match DVD-rip
Filter-repo is really fast. A task that took around 9 hours on my computer by filter-branch, was completed in 4 minutes by filter-repo. You can do many more nice things with filter-repo. Refer to the documentation for that.
Warning: Do this on a copy of your repository. Many actions of filter-repo cannot be undone. filter-repo will change the commit hashes of all modified commits (of course) and all their descendants down to the last commits!
These commands worked in my case:
git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch oops.iso' --prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now
It is little different from the above versions.
For those who need to push this to github/bitbucket (I only tested this with bitbucket):
# WARNING!!!
# this will rewrite completely your bitbucket refs
# will delete all branches that you didn't have in your local
git push --all --prune --force
# Once you pushed, all your teammates need to clone repository again
# git pull will not work
According to GitHub Documentation, just follow these steps:
Get rid of the large file
Option 1: You don't want to keep the large file:
rm path/to/your/large/file # delete the large file
Option 2: You want to keep the large file into an untracked directory
mkdir large_files # create directory large_files
touch .gitignore # create .gitignore file if needed
'/large_files/' >> .gitignore # untrack directory large_files
mv path/to/your/large/file large_files/ # move the large file into the untracked directory
Save your changes
git add path/to/your/large/file # add the deletion to the index
git commit -m 'delete large file' # commit the deletion
Remove the large file from all commits
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch path/to/your/large/file" \
--prune-empty --tag-name-filter cat -- --all
git push <remote> <branch>
I ran into this with a bitbucket account, where I had accidentally stored ginormous *.jpa backups of my site.
git filter-branch --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE' --tag-name-filter cat -- --all
Relpace MY-BIG-DIRECTORY with the folder in question to completely rewrite your history (including tags).
source: https://web.archive.org/web/20170727144429/http://naleid.com:80/blog/2012/01/17/finding-and-purging-big-files-from-git-history/
Just note that this commands can be very destructive. If more people are working on the repo they'll all have to pull the new tree. The three middle commands are not necessary if your goal is NOT to reduce the size. Because the filter branch creates a backup of the removed file and it can stay there for a long time.
$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
$ rm -rf .git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
$ git push origin master --force
git filter-branch --tree-filter 'rm -f path/to/file' HEAD
worked pretty well for me, although I ran into the same problem as described here, which I solved by following this suggestion.
The pro-git book has an entire chapter on rewriting history - have a look at the filter-branch/Removing a File from Every Commit section.
If you know your commit was recent instead of going through the entire tree do the following:
git filter-branch --tree-filter 'rm LARGE_FILE.zip' HEAD~10..HEAD
This will remove it from your history
git filter-branch --force --index-filter 'git rm -r --cached --ignore-unmatch bigfile.txt' --prune-empty --tag-name-filter cat -- --all
Use Git Extensions, it's a UI tool. It has a plugin named "Find large files" which finds lage files in repositories and allow removing them permenently.
Don't use 'git filter-branch' before using this tool, since it won't be able to find files removed by 'filter-branch' (Altough 'filter-branch' does not remove files completely from the repository pack files).
I basically did what was on this answer:
https://stackoverflow.com/a/11032521/1286423
(for history, I'll copy-paste it here)
$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
$ rm -rf .git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
$ git push origin master --force
It didn't work, because I like to rename and move things a lot. So some big file were in folders that have been renamed, and I think the gc couldn't delete the reference to those files because of reference in tree objects pointing to those file.
My ultimate solution to really kill it was to:
# First, apply what's in the answer linked in the front
# and before doing the gc --prune --aggressive, do:
# Go back at the origin of the repository
git checkout -b newinit <sha1 of first commit>
# Create a parallel initial commit
git commit --amend
# go back on the master branch that has big file
# still referenced in history, even though
# we thought we removed them.
git checkout master
# rebase on the newinit created earlier. By reapply patches,
# it will really forget about the references to hidden big files.
git rebase newinit
# Do the previous part (checkout + rebase) for each branch
# still connected to the original initial commit,
# so we remove all the references.
# Remove the .git/logs folder, also containing references
# to commits that could make git gc not remove them.
rm -rf .git/logs/
# Then you can do a garbage collection,
# and the hidden files really will get gc'ed
git gc --prune --aggressive
My repo (the .git) changed from 32MB to 388KB, that even filter-branch couldn't clean.
git filter-branch is a powerful command which you can use it to delete a huge file from the commits history. The file will stay for a while and Git will remove it in the next garbage collection.
Below is the full process from deleteing files from commit history. For safety, below process runs the commands on a new branch first. If the result is what you needed, then reset it back to the branch you actually want to change.
# Do it in a new testing branch
$ git checkout -b test
# Remove file-name from every commit on the new branch
# --index-filter, rewrite index without checking out
# --cached, remove it from index but not include working tree
# --ignore-unmatch, ignore if files to be removed are absent in a commit
# HEAD, execute the specified command for each commit reached from HEAD by parent link
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch file-name' HEAD
# The output is OK, reset it to the prior branch master
$ git checkout master
$ git reset --soft test
# Remove test branch
$ git branch -d test
# Push it with force
$ git push --force origin master
NEW ANSWER THAT WORKS IN 20222.
DO NOT USE:
git filter-branch
this command might not change the remote repo after pushing. If you clone after using it, you will see that nothing has changed and the repo still has a large size. this command is old now. For example, if you use the steps in https://github.com/18F/C2/issues/439, this won't work.
You need to use
git filter-repo
Steps:
(1) Find the largest files in .git:
git rev-list --objects --all | grep -f <(git verify-pack -v .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -10)
(2) Start filtering these large files:
git filter-repo --path-glob '../../src/../..' --invert-paths --force
or
git filter-repo --path-glob '*.zip' --invert-paths --force
or
git filter-repo --path-glob '*.a' --invert-paths --force
or
whatever you find in step 1.
(3)
git remote add origin git#github.com:.../...git
(4)
git push --all --force
git push --tags --force
DONE!!!
You can do this using the branch filter command:
git filter-branch --tree-filter 'rm -rf path/to/your/file' HEAD
When you run into this problem, git rm will not suffice, as git remembers that the file existed once in our history, and thus will keep a reference to it.
To make things worse, rebasing is not easy either, because any references to the blob will prevent git garbage collector from cleaning up the space. This includes remote references and reflog references.
I put together git forget-blob, a little script that tries removing all these references, and then uses git filter-branch to rewrite every commit in the branch.
Once your blob is completely unreferenced, git gc will get rid of it
The usage is pretty simple git forget-blob file-to-forget. You can get more info here
https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/
I put this together thanks to the answers from Stack Overflow and some blog entries. Credits to them!
Other than git filter-branch (slow but pure git solution) and BFG (easier and very performant), there is also another tool to filter with good performance:
https://github.com/xoofx/git-rocket-filter
From its description:
The purpose of git-rocket-filter is similar to the command git-filter-branch while providing the following unique features:
Fast rewriting of commits and trees (by an order of x10 to x100).
Built-in support for both white-listing with --keep (keeps files or directories) and black-listing with --remove options.
Use of .gitignore like pattern for tree-filtering
Fast and easy C# Scripting for both commit filtering and tree filtering
Support for scripting in tree-filtering per file/directory pattern
Automatically prune empty/unchanged commit, including merge commits
This works perfectly for me : in git extensions :
right click on the selected commit :
reset current branch to here :
hard reset ;
It's surprising nobody else is able to give this simple answer.
git reset --soft HEAD~1
It will keep the changes but remove the commit then you can re-commit those changes.
I've seen lots of posts/answers on how to display all of the local commits for all local branches that have not been commited.
I have a narrow use case, and have not found an answer.
I need to determine from within a bash script, if my CURRENT branch has commits that have not been pushed upstream to the same branch. A count would be fine, but I really just need to know if a push has not yet been done. I don't care about any branches except the current branch, and in this case I've already checked to see if the branch is local (i.e. has not yet set upstream origin).
Mainly, I don't want the commits printed out. I just want to know that the number of unpushed commits is > 0.
The upstream of the current branch is #{u}. # is a synonym for HEAD.
#..#{u} selects the commits which are upstream, but not local. If there are any you are behind. We can count them with git rev-list.
# Move 4 commits behind upstream.
$ git reset --hard #{u}^^^
$ git rev-list #..#{u} --count
4
#{u}..# does the opposite. It selects commits which are local but not upstream.
# Move to upstream
$ git reset --hard #{u}
# Add a commit
$ git commit --allow-empty -m 'test'
[main 6dcf66bda1] test
$ git rev-list #{u}..# --count
1
And if both are 0 you are up to date.
Note: this will only be "up to date" as of your last fetch. Whether you want to combine this with a fetch is up to you, but one should get used to the fact that Git does not regularly talk to the network.
(I've borrowed from ElpieKay's answer).
See gitrevisions for more about #, .. and many ways to select revisions.
Suppose the branch is foo.
git fetch origin foo
git rev-list FETCH_HEAD..foo --count
The 1st command gets the latest head of the foo in the remote repository and stores its commit sha1 in FETCH_HEAD.
git rev-list returns a number. If it's 0, all commits of the foo in the local repository have been included in the foo branch in the remote repository. If it's larger than 0, there are this number of commits not included in the remote foo yet.
Being included is different from being pushed if the procedure involves a pending change like a pull request or a merge request. The commands can determine if the local branch's commits have been merged to(included in) its remote counterpart.
You're looking for git status. It will give you output like:
On branch main
Your branch is ahead of 'origin/main' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
And you can grep for this with:
git status | grep -E "Your branch is ahead of '.*' by ([0-9]*) commit."
Searching in StackOverflow and on Google, I can find lots of items about how to sync, and do everything with git, except I can't find a reliable way using a bash script to tell if my local repository is in sync with its remote. I've tried:
1.
git branch -v |
perl -wlne'
print "$ENV{reponame} [$1] --> $3 $2"
if /^..(\S+)\s+([a-f0-9]+)\s+(\[(?:ahead|behind)\s+\d+\])/
' |
while IFS= read -r MOD; do
printf ' %s\n' "$MOD" # Replace with code that uses $MOD
done
and,
2.
isInSync=`git remote update>/dev/null; git status -uno | grep -o "is up to date"`
if [ $? -ne 0 ]; then
echo " $reponame [$br] --> Out of sync with remote at $d"
ok=false
fi
and have been trying to find a way to use:
git log --oneline | head -n1 | sed # this I don't know.
to try to get the first word in the log file, which is the commit hash of the last commit, and compare it with:
rev=$(git rev-parse --short HEAD)
which is the commit hash of the local repository branch you are in.
The problem with #1 is that it doesn't seem to pick up when the local is out of sync with the remote.
The problem with # 2, is that it causes the local .git/config to get involved and produces odd attempts to access different remote repositories like heroic.
The problem with #3 is that I can't figure out how to get the hash code from the git log and then compare it to the $rev above. It would seem to be the best bet as when I check it on different computers in different states, it seems to convey the right information.
I am writing a bash script that checks a group of git projects and tells me their states, i.e. up to date, untracked files, uncommitted files, and out of sync with the remote.
Help would be appreciated in either suggesting a better way or how to do the extraction of the commit-hash from the log and compare it to the current commit-hash of the local last commit.
As you are seeing, you have to define what you mean by in sync.
A Git repository first and foremost is a way to hold commits. The set of commits in the repository, as found by branch and tag names and other such names, is what is in the repository. The commits are the history.
Most Git repositories that users work with, however, are not "bare". They have a work-tree or working tree into which files can be extracted from any commit, and/or additional files created and/or various files modified and so on. This working area can be "clean" (matches a commit, more or less) or "dirty".
Is a "dirty" repository, with lots of work going on inside it, "in sync" with some other repository, even if both repositories have exactly the same set of commits? Or are these "not in sync"? What if the other repository has a work-tree and it's "dirty" in exactly the same way? That's something you need to define.
(Besides the work-tree, all repositories—even bare ones—have an index as well, which can also be "clean" or "dirty", and perhaps you should factor that in as well.)
A repository can have one or more remotes defined as well. Given a remote name, such as origin, Git can be told: connect to some network URL and obtain new commits from a Git over there, and put them into this repository. That's what your git remote update is doing. How many remotes that contacts—it could get some of them, or all of them, or maybe some are unreachable at the moment—is difficult to answer, as this is all quite configurable.
... [get] the commit hash of the last commit
Each branch name automatically holds the hash ID of the last commit in that branch. There can be more than one "last commit", in other words. Using HEAD is the right way to find the hash ID of the current commit, but this may not be the tip commit of any branch:
rev=$(git rev-parse HEAD)
If HEAD contains the name of an unborn branch, this step will fail. That state is the case in any totally-empty repository (because there are no commits, hence there can be no branch names; branch names are required to name some existing commit). It's also the state after a git checkout --orphan operation, however.
I am writing a bash script that checks a group of git projects and tells me their states, i.e. up to date, untracked files, uncommitted files, and out of sync with the remote.
So, you get to choose how to define each of these things.
In general, I would:
Optionally, have each repository contact its main upstream(s), whatever those may be: probably those defined by whatever git remote update does; consider here whether it's good or bad to allow --prune (see also the fetch.prune setting).
Check at least the current branch, as reported by git symbolic-ref HEAD: if this command fails, there is no current branch, i.e., we're on a detached HEAD.
Check the status as reported by git status --porcelain=v2 or similar, to look at the state of the index and work-tree. Consider checking submodule status here as well; git status may do this for you, or not, depending on settings.
Use the current branch's upstream setting, if (a) there is a current branch and (b) it has an upstream. This upstream setting is often the name of a branch in another Git repository. (It can instead be the name of a branch in this repository.) If so, use git rev-list --left-right --count branch...branch#{u} to count the number of commits ahead and/or behind, perhaps after the git remote update with or without --prune.
Optionally, check each branch name, keeping in mind that each Git repository has its own branch names and there's no particular reason, other than convenience and convention, to use the same names in two different Git repositories. That is, my dev might not be related to origin/dev, for instance. If my dev goes with origin/develop I probably set the upstream of dev to origin/develop, so consider checking each branch's upstream. Note that git branch -vv does this (and also counts ahead/behind values, unless told not to, and also has extra support for added work-trees now).
Except for current branch and dirtiness (which git status already reports), most of the work is just git remote update -p and git branch -vv, really.
In the above code for git_check, I replace the section:
2.
isInSync=`git remote update>/dev/null; git status -uno | grep -o "is up to date"`
if [ $? -ne 0 ]; then
echo " $reponame [$br] --> Out of sync with remote at $d"
ok=false
fi
with:
last_commit=`git log --oneline | head -n1 | grep -o "^\w*\b"`
rev=$(git rev-parse --short HEAD)
if [ "$last_commit" != "$rev" ]; then
echo " $reponame [$br] --> Out of sync with remote at $d"
ok=false
fi
Here is the Console output with some comments
Modify a file that has been committed.
[:~/bin] develop(+4/-2) 2s ± git status
On branch develop
Your branch is up to date with 'origin/develop'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: git_check
no changes added to commit (use "git add" and/or "git commit -a")
[:~/bin] develop(+4/-2) ± git_check
bin [develop] --> Changes to be staged/committed at /Users/userid/bin
[:~/bin] develop(+4/-2) 128 ± git add -A
[:~/bin] develop(+0/-0) ± git diff --cached --shortstat
1 file changed, 4 insertions(+), 2 deletions(-)
[:~/bin] develop(+0/-0) ± git_check
bin [develop] --> Changes to be committed at /Users/userid/bin
[:~/bin] develop(+0/-0) 2s ± git commit -m "Better way to check for remote sync"[develop fab4f1d] Better way to check for remote sync
1 file changed, 4 insertions(+), 2 deletions(-)
[:~/bin] develop(1) ± git_check
OK --> bin [develop] fab4f1d
bin [develop] --> [ahead 1] fab4f1d
[:~/bin] develop(1) 2s ± git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 4 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 416 bytes | 416.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/userid/repo.git
ee0f0ca..fab4f1d develop -> develop
[:~/bin] develop 4s ± git_check
OK --> bin [develop] fab4f1d
For me, at least, this appears to solve my problem. As the console output shows, it gives me information about what is happening at each stage of the workflow. Especially, it tells me that my local repository is ahead of the remote repository by 1, telling me I need to do a push. On the other machines, I will see that they are behind by 1, telling me I need to do a pull to sync up.
Thanks to torek for the helpful information.
What the title says.
I want to reset each and every local branch to match my Remote repository, including deleting some branches and tags which only exists locally, without having to delete everything and cloning from scratch. All I could find are instructions on how to reset a specific branch, but not the entire repository.
Even better if it can be done from the TortoiseGit Shell extension. But I'm also fine with the command line.
You can do it by following commands:
git checkout --orphan #
git fetch <Remote> refs/*:refs/* --refmap= --prune --force
where <Remote> is remote repository you want to use. You simply fetch all remote refs (refs/*:refs/*) with --prune and --force flags to remove and force update local references.
The following line will reset all local branches that have a configured upstream branch to the state of the upstream branch
git checkout #{0} && git for-each-ref refs/heads --format '%(refname:strip=2)' | xargs -ri sh -c 'git rev-parse {}#{u} >/dev/null 2>&1 && git branch -f {} $(git rev-parse {}#{u})'
You will end up with a detached HEAD due to the first command, because you cannot reset the currently checked out branch, so checkout the branch you want to have in your working directory after doing this.
Sorry for the verbose title.
In fastlane, using the programming language Ruby, I want to have this functionality:
if remote_branch_exist
git_clone_remote_branch
else
git_clone_master
git_branch
git_push_branch_to_master
end
I have searched for a one liner git command that does that, but not succeeded. Is it possible?
I have written this code that does exactly what I want. But it must surely be an unnecessary amount of code.
def git_clone_sdk_repo(path_repo: nil)
some_branch = "some_branch"
git_url = "git#github.com:MyComp/MyRepo.git"
if check_if_remote_branch_exists(git_url: git_url, branch_name: some_branch)
puts "remote branch exists"
sh "git clone -b #{some_branch} #{git_url} #{path_repo}"
else
puts "no remote branch"
sh "git clone #{git_url} #{path_repo}"
pwd = Dir.pwd
FileUtils.cd(path_repo)
sh "git checkout -b #{some_branch}"
sh "git push --set-upstream origin #{some_branch}"
FileUtils.cd(pwd)
end
end
def check_if_remote_branch_exists(git_url: nil, branch_name: nil)
check_if_remote_branch_exists = "git ls-remote --heads #{git_url} #{branch_name} | wc -l | grep -o -q '1'"
system(check_if_remote_branch_exists)
end
(The method sh in the code block above is used to call CLI commands. I think it is part of fastlane.)
Running this command:
git clone -b <some_branch> <git_url> <path_repo>
Results in:
fatal: Remote branch <some_branch> not found in upstream origin
If there is no branch in the remote with that name. So that is why I am first checking if there is a remote branch with such a name.
What neat git command am I missing?
Let me re-express this task in Git terms, rather than as Ruby code.
You wish to:
Clone a repository from some URL. We will then save that URL under the usual "remote" name, origin.
Given a branch name such as foo, check out that particular branch (so that the current commit is the tip commit of that branch).
If the branch can be derived from a remote-tracking branch, as (e.g.) is usually true for master which usually derives from origin/master—you want Git to create this branch locally, with the corresponding remote-tracking branch set as its upstream, ready to do work on it. Hence if branch foo exists in the Git repository on origin so that origin/foo will exist in the local repository, you want to create local branch foo with origin/foo as its upstream.
If not, however—if there is no corresponding upstream name, so that at the moment, the branch is going to be a new branch—you want to create that new branch such that it points to the same commit that origin/master will point-to. In this case, you then also want to immediately (or as quickly as possible) request that the Git on origin also create this branch-name, pointing to that very same commit, and on success, set foo to have origin/foo as its upstream. Ideally, the end result of this process is that local branch foo exists and has origin/foo as its upstream.
You have observed that if foo exists on the remote, git clone -b foo <url> <directory> does the trick in one clean step (although as a side effect, the local clone will not have a master branch yet!). If foo does not exist on the remote, though, the clone fails.
Unfortunately, there is no single Git command that can do all this. Moreover, there is an atomicity issue here ("atomicity" having its usual meaning in database or parallel programming terms): the fact that foo does not exist during the cloning step does not mean that foo will not exist by the time you ask the upstream repository to create it.
The "best" answer to all of this depends on how much you care about this atomicity problem (solving it generally just moves atomicity issues to a later push step, since branch foo could be removed on the server by then, or have acquired extra commits, or been rewound and rewritten, or whatever). But in the end you must use multiple Git commands.
Method 1
The sequence that uses the least network traffic is to clone without -b. In this case, your clone will check out some branch all on its own—usually master, but the actual branch chosen will depend on what is in the HEAD entry for the Git at the URL that will be stored in the remote. Your clone will then have the remote's URL saved as usual, under the name origin (or any -o argument you supply).
Now you can simply attempt to git checkout foo. Either foo is already the current branch (because it was in HEAD on the remote), so that this is a successful no-op; or foo isn't the current branch. If foo is not the current branch, thish will create foo as a local branch with origin/foo set as its upstream if and only if origin/foo exists. This origin/foo will in turn exist if and only if a branch named foo existed on the remote at the time you did the clone (see "atomicity").
If the git checkout fails, you can assume that origin/foo does not exist. (The only other possibility is that things are going very badly wrong, e.g., you have run out of disk space or the storage device is failing, or there are bugs in Git: in both cases all bets are off.) You can at this point go down your "create foo pointing to the same commit as origin/master and use git push -u to ask to create it on origin too" path, and verify that this all works. As usual with git push, you are now racing against anyone else creating foo. Note also that there may not be an origin/master in your own repository, if there was no master on the other Git at the time you did the clone.
Method 2
You can use git ls-remote as you are doing now, which does one complete round-trip operation to the remote (currently via URL, since there is as yet no local clone, hence no remote named origin to store that URL) to determine the set of references it has. If foo does not exist in that repository, you can ask that Git to create it. You can do this a little bit differently, if you like, using a series of local Git operations in a new repository that, as yet, has nothing at all in it:
mkdir <directory>
cd <directory>
git init
git remote add origin <url>
At this point you can run git ls-remote origin, because now there is a remote named origin. However, there are no local branches at all. Now we run into the usual atomicity issues, and "what to do next" depends once again on how you wish to solve them. But if I were not using method 1 or some slight variant of it, this is what I would do next:
# assumes $branch is set to "foo" as needed, and that
# function "die" prints an error message and exits with failure
git fetch origin # bring over all commits and origin/* branches
if branchrev=$(git rev-parse -q --verify origin/$branch); then
# origin/$branch exists, so we want to act like "git clone -b $branch"
git checkout $branch ||
die "unable to check out $branch, cannot proceed"
else
# origin/$branch does not exist: ask to create it pointing to
# origin/master
rev=$(git rev-parse -q --verify origin/master) ||
die "no origin/master exists, cannot proceed"
git checkout -b $branch $rev ||
die "failed to create $branch"
git push -u origin "$branch:refs/heads/$branch" ||
die "failed to create $branch on origin"
fi
The git checkout -b creates the branch in the local repository, and sets it as the current branch. Since the initial commit ID is given by raw commit hash (due to $rev containing the result from git rev-parse), it will have no upstream. You could instead use git checkout -b $branch origin/master but this will set the upstream for the new branch to origin/master, leaving a trap for the unwary if the git push -u fails for some reason (e.g., network failure). You could use git checkout --no-track -b $branch origin/master, but given the test to make sure origin/master is a valid name, we might as well save the hash ID in $rev and use that.
This same bit of shell script—which you could rewrite in Ruby if you like—can be used after a regular old git clone, instead of using the somewhat obscure git init; git remote add ...; git fetch sequence that does everything git clone would do except for the initial git checkout of whichever branch the remote's HEAD indicates.
(In other words, in practice, I'd just run git clone—without the tricky -b part—first, then do everything in the shell script section above except the git fetch step, which is generally unnecessary right after the clone step. If the clone will take a very long time, the extra git fetch might still be useful, since that will then shrink the atomicity race, at the cost of one more round-trip to the server at origin. Nothing can completely close the race, though.)