How do I ask git to clean out and rebuild the entire working tree? - windows

We have something of an ugly mess that needs cleaning up.
In our codebase, certain developers created directories whose name differed only in case. Normally, such a thing would not be an issue; however we are developing on Windows. This proceeded to confuse git and cause us merge conflicts far down the line.
We have since pushed changes to master getting rid of the duplicate directories in the tree (and encountering some Azure DevOps bugs in the process); now all that remains is a large number of incorrect directory names on users' workstations. Due to the case interactions between git and Windows; these won't get cleaned up by a git pull.
How do we actually erase and rebuild the working trees without throwing out the local branch state in .git? I need to send out instructions so everybody on the team can clean this mess up and those wrong-case local directories don't magically reappear in future commits.

Delete all files and then do a git reset --hard.
git rm -r .
# might need an additional rm outside of git if there are directories left over there
# wait 30 seconds or Windows will restore the wrong case
git reset --hard
Just be careful not to have any uncommitted changes hanging around.

Related

How to use Git commands to add multiple separate folders to GitHub such that each folder can be updated seperately

so I am new to this whole GitHub and Git thing. I recently learned the basics of Git (adding, pushing, pulling, cloning, etc). My intro to Java professor asked me to make a git hub repository for all my class homework. She told me to organize it in such a way that there are separate folders for each homework and each homework folder contains multiple source files.
So I set up my files like this:
Java(main folder) -> Hw1 + Hw2 + Hw3 etc. How would I do this using git? All of these folders should be on my local and git hub repositories and I should be able to make changes to them separately.
Thank You in advance. I am stuck.
Let's start with some basics. You already understand that your computer uses a tree-structured file system: that is, a directory (or folder—the terms are now interchangeable) holds files and/or more directories/folders, which in turn hold more files and/or folders, etc. Windows natively uses a backwards slash \ to separate the various components, so that you might have:
java\hw1\main.java
java\hw1\sub.java
java\hw2\main.java
and so on. Windows can use forward slashes (some commands may use them for other purposes, but they do work in file names), and all non-Windows OSes tend to use forward slashes, which are easier to type. Git also uses forward slashes so that's what I'll do here.
(Aside: Windows and macOS by default use "case insensitive but case preserving" rules, so that if you create a file named readme.txt, you can later open it using the name ReadMe.txt or README.TXT but it remains named in all-lowercase. Git, by contrast, is usually case-sensitive and thinks that readme.txt, ReadMe.txt, and README.TXT are three different file names. This causes endless grief on such systems1 and sometimes the best, or at least easiest, way to avoid all problems here is to completely avoid uppercase letters everywhere. To the extent that you can use java instead of Java, hw1 instead of Hw1, and so on, I would encourage you to do so.)
When you ask Git to create a new, empty repository using git init,2 Git creates a hidden folder named .git. This hidden folder will contain all of Git's files: here, Git will store its main two databases. We'll talk about those in just a moment. The place where Git creates .git is whatever your current working directory is, so if you are in java/hw1 and run git init, Git creates java/hw1/.git. If you are in java and run git init, Git creates java/.git.
Note that java/.git and java/hw1/.git are different folder path names, and therefore you can create two repositories. You do not want to do this, but that's what you did. (I base this claim on this comment.) We'll come back to "how to fix this" soon.
1In particular, someone using Linux can literally create three different files that differ only in case, stuff all three into a commit in a Git repository, and leave you with a problem when you go to check out this commit on a Windows system. If you're used to the system mapping from typed-in-lowercase to the matching case, and you ask an editor to create java/hw1/thing.java on Linux, it might actually create a java and hw1 right next to your existing Java and Hw1. Since those are different directories they can store different files with the exact same names as those in Java/Hw1/, including name-case. Git will happily store all these files, and Windows often cannot extract such a commit properly.
2Note that git init will first check to see if you're already in some existing repository. In this case, rather than creating a new repository, Git will "reinitialize" the existing repository. In most cases "reinitializing" like this has no effect at all.
The main thing to know about Git and a Git repository
A Git repository—or what I sometimes call the repository proper—consists mainly of two databases. One is usually much bigger. It contains commits and other supporting Git objects. These objects all have hash IDs (or more formally, object IDs or OIDs) that Git must have in order to retrieve the objects from the database. This could force humans to memorize Git commit hash IDs, but that's a bad plan: hash IDs are very large, very random-looking, and impossible for humans to remember in general.
For this reason, a Git repository contains that second, usually much smaller, database. In this database, Git stores names: branch names, tag names, remote-tracking names, and many other kinds of names. These names are for you (and other humans) to use. Each name stores one hash ID, but that's enough to make everything work. So you'll use a branch name, like main or master. This name holds the hash ID of the latest commit, which allows Git to retrieve that commit.
Each commit stores two things:
A commit stores a full snapshot of every file (that Git knew about, that is) at the time you, or whoever, made that commit. The files inside the commit are stored in a special, read-only, Git-only, compressed and de-duplicated form, that only Git can read, and literally nothing can write. (This uses some of those "supporting objects" I mentioned; the files are actually stored in the objects database as "Git objects".) Because nothing but Git can use these files, the files in a commit are useless on their own. We'll see in just a moment how we work with these files.
Meanwhile, that same commit that's storing a snapshot, also stores some metadata, or information about the commit itself: who made it (you, probably), and when, for instance. To make "branches"—a poorly-defined word in Git (see What exactly do we mean by "branch"?)—work, the commit's metadata contains the hash ID of the previous commit.
This "contains previous commit's hash ID" is how Git stores history: the branch name, e.g., main, lets Git find the last commit you made, and then by reading that commit, Git can find the hash ID of the second-to-last commit. For instance, suppose the hash ID of the last commit is H (it's actually some big ugly hexadecimal number so we're just using H to stand in for it). Then we say that the name main points to commit H. But commit H contains the hash ID of an earlier, or parent, commit: let's call that one G. We say that H points to G, and we can draw that:
<-G <-H <--main
Since G is a commit, it has one of these points-to pointers sticking out of it, too. By reading commit G's metadata, Git can find the raw hash ID of its parent; let's call that commit F:
... <-F <-G <-H <--main
So main points to H, which points to G, which points to F, which points to ... well, this goes on until we get back to the very first commit ever—commit A perhaps—which, being first, can't point backwards and therefore simply doesn't.
What this means is that instead of one hash ID, each commit stores, in its metadata, a list of previous-commit hash IDs. The list can be empty, and is for that first commit. It can also have more than one hash ID, but we won't cover this case here. Most commits in most repositories are "ordinary" commits and have exactly one parent, though.
Your "working tree"
A repository, then, stores names—branch names for instance—that help Git find commits for us (we only have to remember the branch names), and stores commits that then store files. But the stuff in the commits (along with the actual commits themselves) is all completely read-only. Git must do this to make the hashing scheme work. What good are stored files if we can't write on them? Moreover, only Git can read them, so what good are they if we can't even read them?
This is where your working tree comes in. Most Git repositories have a working tree.3 The working tree of a repository is, quite simply, where you do your work. And, as we saw earlier, if you use git init in some directory to create a new, totally-empty repository and then make an initial commit:4
mkdir new
cd new
echo example > README.txt
git init
git add README.txt
git commit
you will wind up with a hidden .git folder here in the new/ folder we just made (mkdir new) and entered (cd new). The working tree for this Git repository in new/.git is new/, and the file we created—README.txt—in that working directory is now also stored in the first (and so far only) commit in that repository.
If we now modify the one file, and/or add a new file, and use git add and git commit appropriately, we'll get a second commit that stores (forever5) the new versions of that file. That second commit has, as its parent commit, the first commit, which stores (forever) the earlier version with just the one file in it.
The second commit is now our current commit, and is now the last commit on the main or master branch (whatever its name is).
Git allows us to check out any commit we have stored in the repository. When we do that, Git will erase from our working tree the files that go with the current commit. It will, instead, install into our working tree the files that go with the newly selected commit—which then becomes the current commit.
In this way, we can "go back in time", any time we like, to any older version, stored as a commit in the big database. All we have to do is find its commit hash ID (for which git log comes in handy, for instance). That's not what we'll focus on right now though.
3The exception here is a so-called bare repository. We won't cover these here.
4These are Unix-shell-style commands as I don't use Windows myself, but this should work in git-bash, which is just a port of bash to Windows for use with Git. You can do all this in PowerShell or even CMD.EXE instead, but some command details might change.
5Well, forever, or as long as the commit itself continues to exist. If we remove the commit, we remove its snapshot. This is actually kind of hard to do! However, if we remove the repository proper, we destroy the two databases, which removes all commits, and this is pretty easy to do.
"Nested" repositories: the thing you didn't want, but made
Given that the computer—the host operating system, which is in your case Windows, but this is also true of macOS and Linux—demands and uses a tree-structured file system, we can set up a structure like this:
java
.git
<various Git repository control files and databases>
hw1
.git
<various Git repository control files and databases>
main.java
hw2
.git
<various Git repository control files and databases>
main.java
and so on. Here we have one repository per hw directory plus one overall containing repository in the java directory.
But here's the problem: Git literally cannot store a Git repository inside a Git commit.6 Instead of doing so, the "outer" repository—in this case the one in java/.git, whose working tree is the java/* files—will store what Git calls a submodule using what Git calls a gitlink. To store a submodule correctly, you must use git submodule add, not git add; git add creates or updates only the gitlink, which is sort of half a submodule.
If someone does want submodules (but you don't), this git submodule add method is how to make them. The result is that when you clone the java repository, you get files, plus the magic gitlinks, that Git will need in order to run additional git clone commands, one for each submodule. This way, the person who clones the java repository can run git submodule update --init to run a bunch more git clone commands. But again, that's not what you want.
6There are some tricks to get around this problem if you really need to do it, but it's not a good idea in general. The recent safe.directory stuff is an outgrowth of a security issue that resulted in a CVE when someone discovered such a trick. The tricks that Git allows involve renaming the .git directory; the ones it doesn't allow, or accidentally allowed in the past, result in CVEs. 😀
Fixing the mess
The observations we should make at this point are these:
Git stores commits. It doesn't store files (though commits do store files). It stores commits.
What you want is a single repository with multiple commits, where the first commit—or maybe second; see below—contains a file named hw1/main.java,8 but no files named hw2/whatever.
What you have now are multiple repositories: one, a superproject, with submodules (or half-submodules) named hw1, hw2, and so on, and then more repositories that get cloned into hw1, hw2, and so on, each containing a main.java and whatever other files.
Now, if we assume (or you verify) that you do not need to save any of the commits in any of these repositories so far, what we can do is simply delete all the .git folders and their contents.
That is, on a Unix-like shell, we would run:
cd java
ls # make sure we're in the right place
rm -rf .git # remove this working tree's Git repository
rm -rf hw1/.git # remove the Git repository in hw1/
rm -rf hw2/.git # and so on ...
Note that we're using the OS's remove command, with the "remove everything without asking" options, on the hidden Git folders. Git has no opportunity to stop us: we're totally bypassing Git here. All of Git's files, including the two big databases, get completely removed. This is likely to be irrecoverable (depending on your OS and whether you're using the OS's "remove irrecoverably" command, or its "move to trash so I can get it back if I change my mind" command, and also depending on whether you have good backups, e.g., macOS Time Machine).
We now have only all of the working trees, with no .git folders: there are no repositories left. But all of the files are still there because the checked-out files were, and still are, in the working trees.
Now we create one new, totally-empty repository in the java directory, that we're still in:
git init
[Git prints message: Initialized empty Git repository in ...]
We now have our initial, totally-empty repository. I like to create a first commit that contains just a README.txt (and maybe one or two similar files):
echo repository for "insert class name here" > README.txt
git add README.txt
git commit -m "Initial commit"
We're now ready to "complete" homework assignment #1:
git add hw1
git commit
(write a good, proper commit message in editor)
By running git add hw1 when there's no Git repository inside hw1, we add all the files that are in hw1 (including any files in any subdirectories inside hw1).
The git commit command commits what's been stored so far, as updated by our git add. So when we commit the addition of hw1, we get README.txt—which we didn't change, so this commit literally re-uses the previous version of the file—plus all the hw1/* files.
We can now "complete" homework assignment #2 with git add hw2 and committing, and so on. We end up with a single repository in the java/.git directory, containing multiple commits: an initial one with the README file and subsequent ones with each homework assignment added. There is just the one branch name and it holds the hash ID of the last commit.
Pushing this to GitHub
Your last problem here is that if you have already created a GitHub repository and put some commits in it, your existing GitHub repository is going to be reluctant to lose those commits. You have several options:
You can keep those commits, if you really want to.
You can tell GitHub to completely delete that repository, then create a new one with the same name.
Or, you can use git push --force from your laptop (or other computer) that has your new repository, so as to command the Git software on GitHub to go ahead and lose the old commits from the old repository.
The general idea here, with the last option, is that we (and Git) find commits by starting from some branch name like master or main. That gives us the hash ID of the last commit, and from there, we have Git work backwards.
Suppose we command (not just ask) some GitHub repository to take a new chain of commits. That is, they had:
A <-B <-C <--main
We now make a totally new (empty) repository, and put in two commits: an initial commit D and a second commit E, neither of which have the same hash ID as any of those three commits in the original repository:
D <-E <--main
We run git remote add origin url to set things up so that we can git push to GitHub. If we run:
git push origin main
our Git will send commits D and E to GitHub, then politely ask if they can add commits D-E to their repository. But that would give them:
A <-B <-C
D <-E <-- main
which, they notice, will mean they no longer have any name by which to find commit C, which means they'll "lose" all three hash IDs. So they will say No! If I do that I'll lose my access to some of my commits!
Your Git software reports this as ! [rejected] main -> main (non-fast-forward): it means they are saying they could lose commits. But that's exactly what you want: you want them to lose A-B-C; those commits are no good! So you can use git push --force origin main, which sends D-E again but this time commands them to make their main point to E.
You have to have permission—GitHub add a whole set of permissions that base Git fails to provide—but if you own this GitHub repository, you probably will already have the right permissions.9 So they'll obey: they will make their branch name main point to commit E, and "forget" commits A-B-C.10
8Note that while your OS demands folders with subfolders and files, Git just stores "files with long names that have slashes in them". Git understands the folder-y requirements your OS makes, and can turn hw1/main.java into "file main.java in folder hw1. It will automatically save the OS's hw1/main.java—a file named main.java in a folder named hw1—as the Git file named hw1/main.java.
Normally, you don't need to worry about this whole mess. The time when you do have to worry about it is when you want to store an empty folder in Git, because Git literally can't do that. Git only stores files. There are some tricks for this though: see How can I add a blank directory to a Git repository?.
9If you own the repository, the only way you wouldn't have permissions is if you logged on to GitHub and told them to deny permission to yourself. To fix that, log on to GitHub again and tell them to give permission back to yourself.
10"Normal" Git setups really do eventually forget (or lose) commits this way. GitHub, however, have their software set up to retain all commits forever. So if you send a bad commit to GitHub, and for whatever reason, you really need it removed, you must contact GitHub support and get them to scrub it off their systems.

When Git deletes or removes a file where does it go?

When Git removes files either through a soft or hard reset - where do these files go? Is there any way to go back to the condition before a mixed reset?
Normally when a file is deleted on an operating system, it goes to a trash can. When files are deleted or removed via Git they seem to go into an ether. Where do these files go?
I have a stack of new files that weren't added properly and I foolishly ran a mixed reset and now these files are no where to be seen.
I'm using SourceTree for OS X by the way.
If you did a --mixed or --soft reset , then the files in your directory would not have gone anywhere because those types of resets do not effect your working tree. With a --hard reset the files in your working tree will be deleted.
Just a small explaination of git reset command usage:
The main parameters are soft, hard and mixed. These tell Git what to do with your index and working copy when performing the reset.
Soft
The --soft parameter tells Git to reset HEAD to another commit, but that’s it. If you specify --soft Git will stop there and nothing else will change. What this means is that the index and working copy don’t get touched, so all of the files that changed between the original HEAD and the commit you reset to appear to be staged.
Mixed (default)
The --mixed parameter (which is the default if you don’t specify anything) will reset HEAD to another commit, and will reset the index to match it, but will stop there. The working copy will not be touched. So, all of the changes between the original HEAD and the commit you reset to are still in the working copy and appear as modified, but not staged.
Hard
The --hard parameter will blow out everything – it resets HEAD back to another commit, resets the index to match it, and resets the working copy to match it as well. This is the more dangerous of the commands and is where you can cause damage. Data might get lost here!
And more thing, You can recover the lost data upto some extent using git reflog.
So, in your case data should be present there only, since you gave --mixed parameter.
When files are deleted or removed via Git they seem to go into an ether. Where do these files go?
When you do a hard reset git will unlink(2) the files.
The data might still be there on the file system but there are no guarantees. Please read https://unix.stackexchange.com/questions/10883 where some recovery tools are mentioned, mostly for ext3.
The git commit, stash and branch commands are often used to save work. Also pay a special attention to the reflog for locating and re-retrieveing runaway leaf changes, especially after a reset.

Files lost in Git repo download! Overwrote my work

I think I've done something rather stupid which may have cost me a couple of days of work. What follows is a question not so much about GIT itself as how to recover some files I have lost in the process of trying to use Git on a Mac.
I have been using Atlassian Sourcetree to make Git commits and pushes and to work with other members on a team. I have only been committing, pushing and pulling from Git.
As I've mentioned, I've been using SourceTree, but I wanted to evaluate Github for Mac as well.
At the time, I had made some changes to the files in my Git repo, representing about six hours of work. I did NOT commit or push these changes.
After I installed Github, I stupidly set Github to clone the repo to the same folder on my Mac as I had been making my changes in... essentially, Github downloaded the repo and overwrote all of my changes.
There were some files that were overwritten, and some new files that I created that were deleted.
Is there is a way to retrieve these files, either by some Git-based voodoo or some aspect of Mac OS X journaling that I'm not aware of? I would really appreciate hearing about it if there is.
So, from what I remember from having my life destroyed by my stupidity with git, it has a place where you can find your old code.
Go to your main repo folder and then type cd .git/lost-found/other/ or cd .git/lost-found/
You should be able to find a set of files that were older and you can then manually get them back by copying them in.
Here's some more links on it :
Recovering added file after doing git reset --hard HEAD^
Undo a git pull
http://www.quora.com/Git-revision-control/How-can-I-recover-a-file-I-deleted-in-my-local-repo-from-the-remote-repo-in-Git

How do I ignore filemode changes in Git after conflict resolution?

I'm seriously about to stop coding and become a carpenter. This problem has had me stressed out for quite some time now and there doesn't seem to be any clear solution, other than forcing non-windows machines to use the file permissions windows seems to inflict.
Let's begin with the scenario. I have 2 development machines, one running Windows7, the other Mac OSX. They both are using Eclipse and EGit, and both have cloned the same project from a remote repo. That's where the similarities end, because my windows machine has a nasty habit of retaining a file mode of 644 (r-xr--r--) on its local repo, while the mode on the Mac defaults to a cool 775 (rwxrwxr--x).
So the problem's obviously the file permissions - GIT reports there are files that have changed due to differences in file modes and not actual content. The solution seemed obvious, run the following commands:
git config core.filemode false
git config --global core.filemode false
...which worked like a charm, except when committing and merging resolved conflicts.
For example, say files A, B and C exist on both the Windows and Mac repos. Next, let's change the file mode for these 3 files on the Mac machine so that the developer can edit them. Now change some of the contents in file A. Because we're ignoring the file modes (thanks to the commands above) only file A will be committed and pushed, ready for the Windows machine to fetch, merge and enjoy...
Now, let's edit file A again on the Mac and on the Windows machines, effectively causing a potential conflict, and let the Windows machine commit and push file A first. Then, when the Mac user commits their changes to file A and fetches the latest changes from the remote repo, a conflict is obviously created.
After resolving the conflict on the Mac machine and adding file A back to their local repo, committing that merge includes the previously ignored files B and C, and thus highlighting my problem! Why are the previously ignored files being included in this merge commit? This doesn't seem to be a Mac / Windows problem exclusively, as this problem can be recreated both ways...
This probably wouldn't matter if there were only 3 files, but this project I'm referring to includes thousands, and all these up and down push and pulls are insane. Am I missing something really obvious? Any help will be greatly appreciated!
So after a long and often frustrating run with trying to get Windows and Mac OS machines to like each other when comparing notes and changes over Git, it seems to just be the functionality of Git itself that's driven me mad, and a better understanding of how to better use Git seems to be the final answer.
Originally, I wanted to know how to continue ignoring file-mode changes (which Windows seemed to love updating with its own idea of what permissions and modes should be) while I was resolving conflicts created from updates to the files from someone else. Short answer: you can't. It's just the way Git works.
So how would you deal with this situation when it came up? Short answer again: use branching.
In my original way of using Git, I was continually working on my master branch on my local repo - bad idea already - so all my work would be committed there and all conflict resolution would need to be handled there too, forcing all files to be compared again and all permissions and file modes to come into question.
With branching, you work on another branch, commit to that branch, pull updates to your master branch, and merge you other branch with your master branch after that. Suddenly no more conflicts on the master from the remote repo, and you're winning!
Commands to do this (creating a branch from currently selected branch):
git branch newbranch
To checkout your new branch:
git checkout newbranch
To merge your new branch with your master branch (after you've committed to your newbranch, switch to the master first):
git checkout master
git merge newbranch
Hope this helps someone! -:)
well tbh, I'm not sure how you arrived at your conclusion since the obvious problem is git is not ignoring the file mode changes. This happens to us here too.
if you set that flag it seems to make no difference in some cases and still uses file modes to determine changed files.
that has to be a bug in git, what else could it be?
possibly your answer does not explain the rationale, therefore I dont think it's the correct answer, it's a workaround.
the correct answer is that git doesnt use filemode when it's told not to, it's obviously ignoring that and doing it anyway.
can you explain otherwise.

How do you undo a hard reset in Git Gui or Gitk on Windows?

I'm using Git Gui and Gitk on Windows. How do I undo a hard reset from within the past two hours?
(Is it possible to do this from these applications, without using the command line?)
I saw this SO post, which says that undos are possible before git's garbage collection occurs. I might have quit and reopened one or both of these applications.
If you had changes in your working tree that were not committed when you did git reset --hard, those changes are gone for ever. You have to use your memory (in your head) to recreate them.
Changes that were committed after the commit to which you switched are not lost. They likely have no reference pointing to them, making them more difficult to locate. The tool to list all low-level changes to the repo is git reflog.
After you locate the commit which you want to revert to observe the hash number in the first row and use git reset --hard #hashnumber or git checkout #hashnumber to get the changes.
I found this useful line on http://quirkygba.blogspot.com/2008/11/recovering-history-with-git-reflog.html:
gitk --all $(git reflog | cut -c1-7)
This will display all the hidden changes in gitk, where you can comfortably view, point, click and create new branches.
As you mentioned the unreferenced commits are normally being kept in the repository for 30 days.
EDIT: I have to add stuff here so that my edit is at least 6 characters. I know, sometimes code fixes are less than 6 characters, but there might, after all, be something else to improve in this post.
See the answers by Brian Riehman and Pat Notz in the link in the question.
One solution is to use the command line.
In Windows, open DOS in the directory containing your .git directory.
Type something like the following to see what commit you want to go to:
"c:\Program Files\Git\bin\git.exe" reflog
To go to a certain commit, type something like the following, where the last expression is the SHA1 code of that commit:
"c:\Program Files\Git\bin\git.exe" reset --hard 5eb4080
I don't think you can undo a hard reset to get uncommitted changes back - you can undo a rebase because the blobs are still available, but if you never committed your newest changes to Git ever, anything it overwrote is most likely history. I'd love to find out that I'm wrong though!

Resources