Cannot prevent git from modifying files after rm --cached - windows

I am attempting to run 'git rm -rf --cached .' along with 'git add .' to remove cached files that are now listed in the .gitignore. I use Visual Studio on a windows computer, and prefer to leave line endings just as they are for this particular situation.
I tried setting core.autocrlf to false using git config command. I tried creating a .gitattributes with the line '* -text', rm'ing the .git/index, and running git reset. So far, every time I add the files back, I get a huge list of modified files.
EDIT: The change in the files is not actually line endings, it is changes in file permissions which I did not request.

Edit: the remaining problem is that the file modes are apparently not stored properly in Windows systems (see also What is git's "filemode"?). To save and restore them, one will need a script, plus the original data:
git ls-files --stage > /tmp/original
To recover the modes, this rather crude pipeline should work:
< /tmp/original \
awk -F$'\t' '/^100755 / { print "git update-index --chmod=+x \"" $2 "\"" }' |
sh
This will attempt to chmod +x files that have been removed by the below sequence, so you can expect some error messages if there are any such files. (It also assumes no files have double quotes in their names.)
Assuming you do not already have a .gitattributes file, here is a six step process that should work:
Create that .gitattributes file just as you did
Run rm .git/index
Run git checkout HEAD -- .
Run git rm -r --cached .
Run git add .
Run git rm .gitattributes (you can leave this until after verifying that it all worked). Run git commit afterward.
I do not have (nor use) Windows so cannot test this, but here's the theory behind why it should work, and hence why there are these steps.
Git's actual data storage format is a special, Git-only, compressed (sometimes highly compressed) format. Files stored in this format are mainly useful only to Git itself. This format stores a raw, uninterpreted byte stream: files do not have to be separated into "text" and "data" and so on, they are just raw byte streams (hence treated as "data" / "non-text"). The data, once stored, are read-only and get assigned a hash ID (currently SHA-1 though a future Git may use SHA-256). Git calls a file stored this way a blob, which is a term stolen from the database world.
Your computer's useful-file-storage format is of course different, and may (and does on Windows) make a distinction between "text" and "data". Text may have encodings (such as ISO-8859-1, UTF-8, UTF-16, and so on). These files are generally both readable and writable and anything on your computer can deal with them (to some degree anyway, depending on encoding).
Git has to extract files from commits, turning them from blobs into files that you can work with. These files live in your work-tree. You work with them, and then git add them to give Git a chance to re-blob-ize them.
In between these special Git-only blobs and the work-tree, Git needs a place to store the blobbed data, that—unlike a commit—is writable, but that—like a commit—has the file in the special Git-only format. This "in between" place is Git's index. Various bits of Git documentation sometimes call this the staging area or the cache.
Git uses the index copy of each file (or blob, really) to make new commits. When you run git add, Git reads the work-tree file, encodes it down into the blob form, and saves it—well, its hash ID, really—in the index. When you run git commit, Git simply freezes the index copies into committed copies.
When you run git checkout to switch to some commit, Git extracts the commit into the index (filling in all the blob hash IDs), and also extracts the blobs into the work-tree so that they are in useful format and you can work on them. When you run git add, Git compresses the work-tree file into its blob format and replaces the index entry for the file.
Transforming a blob into a work-tree file, or vice versa, is the ideal place where Git will do any conversions you need, such as turning newlines into CRLF line endings. So that's where Git does it: git checkout fills the index and expands-and-converts into the work-tree, and git add compresses-and-un-converts from the work-tree into the index, ready for the next git commit. (Any files you don't touch, stay compressed and ready to go, safely tucked away in the index.)
You already know that a tracked file is one that is in the index, and an untracked file is one that is in the work-tree but not in the index. Your goal is to use the existing .gitignore to make files that are currently in the index go away from the index if they would be .gitignore-ed. The process you are using is:
git rm -r --cached .: remove everything from the index, so that the entire work-tree is untracked
git add .: produce all new blobs in the index from whatever is in the work-tree, while ignoring any file that is listed in .gitignore.
The issue here is that what's in the work-tree has been converted by the "blob to work-tree" conversions, and will be "un-converted" by the "work-tree to blob" conversions. Creating a .gitattributes file with * -text tells Git: The conversions to do are no conversions at all."
Unfortunately, it's too late: the git checkout you ran earlier, to get this commit into the work-tree, already did some conversions.
So here, we use step 1 to create a .gitattributes file that says do no conversions. Step 2, rm .git/index, removes the index entirely. Git now has no idea what's actually in the work-tree. This step may be unnecessary but I use it to force Git to act in step 3, which tells Git: extract every file from the HEAD commit into the index and the work-tree. This re-creates the index, and re-fills the work-tree, this time doing no conversions.
Steps 4 and 5 are just as before, but this time, the work-tree files all match the blobs in the HEAD commit since step 3 operated with the .gitattributes directive in place. Step 6 is to make sure you do not commit the "do no conversions" directive.

Related

Git lists files as changed but there are no changes

This is the umpteenth version of the extremely basic question "why the heck is Git telling me that files changed but diff shows no changes?". Similar questions have been posted here and here but none of those answers help.
My scenario is as follows:
I added a .gitattributes file to an existing Git repo with several already existing commits in it. The content of the .gitattributes file looks as follows:
* text=auto
*.bat text eol=crlf
*.cmd text eol=crlf
*.ps1 text eol=crlf
*.sh text eol=lf
*.csproj text eol=crlf
*.filters text eol=crlf
*.props text eol=crlf
*.sqlproj text eol=crlf
*.sln text eol=crlf
*.vcxitems text eol=crlf
*.vcxproj text eol=crlf
*.cs text
*.config text
*.jmx text
*.json text
*.sql text
*.tt text
*.ttinclude text
*.wxi text
*.wxl text
*.wxs text
*.xaml text
*.xml text
*.bmp binary
*.gif binary
*.ico binary
*.jpg binary
*.pdf binary
*.png binary
After adding that file I executed the following command:
git rm --cached -r .
git reset --hard
The result is that Git git status now shows most of the files in the Git repo as modified. However, I cannot see any changes in any of those files. The diff tool isn't showing any changes, neither in the text view nor in its hex view.
The repo has been created on a Windows machine and I'm currently using it on a Windows machine. The output of the command git config --list is as follows:
http.sslbackend=schannel
diff.astextplain.textconv=astextplain
credential.helper=manager-core
core.autocrlf=true
core.fscache=true
core.symlinks=false
core.editor="C:\\Program Files\\Notepad++\\notepad++.exe" -multiInst -notabbar -nosession -noPlugin
pull.rebase=false
credential.https://dev.azure.com.usehttppath=true
init.defaultbranch=master
user.name=My Name
user.email=my#email.whatever
core.autocrlf=true
core.eol=crlf
diff.tool=bc
difftool.bc.path=C:/Program Files/Beyond Compare 4/bcomp.exe
difftool.bc.cmd="C:/Program Files/Beyond Compare 4/bcomp.exe" "$LOCAL" "$REMOTE"
difftool.bc.prompt=false
merge.tool=bc
mergetool.bc.path=C:/Program Files/Beyond Compare 4/bcomp.exe
mergetool.bc.cmd="C:/Program Files/Beyond Compare 4/bcomp.exe" "$LOCAL" "$REMOTE" "$BASE" "$MERGED"
mergetool.bc.keepbackup=false
mergetool.bc.trustexitcode=true
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
So the magic switches core.autocrlf and core.eol are as they should be for Windows as far as I could decrypt from the documentation.
Does anyone have a clue what Git landmine I've stepped on here?
There are multiple possibilities here, but the most common by far has to do with these CRLF line endings. It's complicated, and to really get it, we need some background first.
From a high level point of view, Git basically has two options:
Don't mess with line endings ever.
Do mess with line endings.
The first one is really simple, and is the default on all Unix-like systems. It's probably the default on Windows too, but I don't use Windows, so I'd have to defer to anyone else who says otherwise. In this setup, if you create a file and store, in that file, the byte-sequence:
h e l l o CTRL-M CTRL-J w o r l d CTRL-M CTRL-J
and then git add the file and run git commit, Git will store, in the repository, a new commit in which that file contains those 14 bytes. The blob hash ID will be:
$ printf 'blob 14\0hello\r\nworld\r\n' | shasum
23eb407b644b0e362fa224168ecd0adfa02b022a
This file has CRLF line endings. Extracting the commit will produce a file with CRLF line endings. The file in the repository is now read-only, frozen for all time; it has blob hash ID 23eb407b644b0e362fa224168ecd0adfa02b022a, as does every file in any Git repository anywhere in the universe, as long as that file contains exactly that text.
Now suppose, having created this file (or not), we turn on the "do mess with line endings" options. We now get numerous sub-options, specifying just how Git will go about messing with line endings, when, on which files. These include eol=crlf, eol=lf, text, binary, and so on:
*.bat text eol=crlf
*.sh text eol=lf
*.jpg binary
This fragment tells Git that if the file's name ends with .bat, Git should mess with line endings in one particular way; if it ends with .sh, Git should mess with line endings in another particular way; and if it ends with .jpg, Git should not mess with line endings.
We know that the binary specification means that for such files, Git doesn't mess with line endings. This is good since, for instance, .jpg files do not actually have lines in the first place, so that anything that resembles a line ending is just coincidence. When Git isn't messing with anything, it's all easy: Git is storing what's there and showing you what's stored.
But that's no longer true for the other files. Since Git is now messing with their line endings, it becomes important to ask and answer more questions:
When exactly does Git mess with the line endings?
What exactly does Git do when it does this messing-about?
This is where things get complicated. The key to understanding things here is to know about Git's index. This thing—this "index"—is central in Git and you really do have to know about it to use Git properly, so let's take a tour of the index.
Git's index
Git's index is either so important or so poorly named (or both) that it actually has three names. It is also called the staging area, which refers to how you normally use it, and it is sometimes called the cache. This last name is pretty rare these days: you mostly see it in flags like git rm --cached. (Some commands, like git diff, have both --staged and --cached, with the same meaning. For some reason no one has gotten around to adding git rm --staged yet. I thought that would have happened by now, and I still think it will happen someday.)
The index does a bunch of things for Git, but here we really care about what it does for—and to—you. What it does for you is hold your proposed next commit. Git is, fundamentally, not about files, but rather about commits. Each commit holds files: in fact, each commit has a full snapshot of every file. (Each commit also has some metadata, such as the name and email address of the commit's author, but we'll skip that here.)
The thing about commits, though, is that they're purely read-only. You can make new ones, but you can never change any existing commit. The git commit --amend command, for instance, fakes it: it does not change the existing commit, it makes a new one and stops using the old one in favor of the new one instead. When you can't tell the difference—and sometimes you can't—this is just as good. When you can tell the difference—and sometimes you can—the cracks show through.
But if you can't change a commit—and you can't—and if, as is also true, the files inside a commit are in a special, compressed, de-duplicated, Git-only form that no programs other than Git itself can even read in the first place, how can you use the files that are inside a commit? The answer is simple enough: In order to use a commit, you have to have Git extract that commit first. We run git checkout or git switch to achieve this. Git extracts the files from the commit, placing usable version of them in our working tree or work-tree, where we can see them and get our work done.
Git could stop here, with committed files—read-only inside the current commit, frozen for all time—and working files. Other version control systems do stop here. But Git doesn't. Instead, as it's extracting the commit, Git puts "copies" of each file into Git's index.
I put "copies" in quotes here because the files in Git's index are stored in the internal, compressed, de-duplicated format. Since they were just extracted from some commit, they take no space: they're de-duplicated away. They hold the same data in the index that they hold when they're inside the commit: this data is frozen for all time.
What's special about the index "copies" of files is that, unlike the committed copies, you can replace them. The git add command tells Git: compress and de-duplicate the working tree file. Git reads the working tree copy, compresses it, and checks to see if the compressed result is a duplicate of some existing file in any existing commit. (This is where that blob hash ID trick comes in: it's why any file consisting entirely of hello\r\nworld\r\n has hash ID 23eb407b644b0e362fa224168ecd0adfa02b022a.) If this is a duplicate, Git puts the duplicate's hash ID in the index. If it's not a duplicate, Git arranges to store a new blob in the object database,1 and stores the new blob's hash ID in the index.
Either way, after this update-the-index step, the proposed next commit is now updated. The file you git add-ed is now staged, and git status will compare the staged hash ID to the current-commit hash ID and say staged for commit if these hash ID's don't match. (This means that git add-ing a file that's been turned back to match the committed copy takes away the staged for commit message, even though the file will in fact be in the next commit. It's just that the hash IDs now match!)
So, Git's index holds this proposed next commit. To make a new commit, you:
futz with the files in your working tree;
run git add on them to copy them back into Git's index; and
run git commit to package up whatever is in Git's index right then.
This is why you have to keep git adding a file each time you change it: Git doesn't automatically copy the working tree file back into the index. Git only copies it back when you say to do that.2
The end effect—and what you should take into the next section—is that, at all times, Git has three copies of each file:
HEAD index work-tree
--------- --------- ---------
README.md README.md README.md
img.jpg img.jpg img.jpg
main.py main.py main.py
for instance. The work-tree version is the one you can see, read, write, feed to a JPG viewer, run with the Python program, and so on. The other two are for Git: the HEAD version is the frozen-for-all-time copy from the current commit and the index version is the malleable-but-frozen-format copy, ready to go into the next commit.
The git checkout or git switch command switches to some commit, copying the files out of the commit to Git's index and then to your working tree.
The git restore command reads a file from somewhere—a commit or the index—and writes it to the index and/or your working tree based on the -S (write to staging) and -W (write to work-tree) options.
The git reset -- file command reads a file from Git's index and writes it to your working tree. (The -- here is a precaution, in case the name of the file is, say, master or dev or something that resembles a branch name).
The git add file command reads a file from your working tree and writes it to the index.
(Lots of alternatives are not listed here.)
So all these various commands are tricks for manipulating the index and/or working tree copy, in preparation for making the next commit (since Git is mostly about making new commits, while keeping all the old ones).
1Git actually stores the new compressed blob object immediately, even if it winds up being replaced before you make a new commit. This is okay (if perhaps sub-optimal in certain peculiar situations) because Git will run git gc for you now and then. Certain older Git versions had a bug where git gc didn't get run often enough, and this could actually be a problem, but that's been fixed for years now.
2Using git add -u tells Git to find modified working tree files, and add them, which automates the job. Using git commit -a is a lot like running git add -u && git commit: it runs a git add -u step before the commit. However, -a complicates things a bunch, and interacts badly with poorly-written pre-commit hooks, so it's kind of a bad idea. Try not to rely on it: use git add -u instead, in case you have one of these bad commit hooks. Or, learn to love the index, which lets you play clever tricks like git add -p, although this too interacts badly with poorly-written pre-commit hooks.
How and when Git messes with line endings
If:
Git is told to mess with line endings, and
a file is marked text, so that Git will mess with this file, or the text=auto setting is being used and Git guesses that this file is text
then:
Git will optionally mess with the file's bytes on the way from index to working tree (checkout or switch, restore, various kinds of reset, etc), and
Git will mess with the file's bytes on the way from working tree to index (add, mostly).
What messing-about will Git do? That depends on the eol= setting:
eol=crlf: On the way out, Git will change LF-only to CRLF. If a line reads hello\n in the index, Git will write hello\r\n to the working tree copy. On the way in, Git will change CRLF to LF-only. If a line reads hello\r\n in the working tree copy, Git will write hello\n to the index copy.
eol=lf: On the way out, Git will do nothing to the file. On the way in, Git will change CRLF to LF-only.
That's it—that's all Git will do! It won't ever change LF to CRLF on the way in, for instance. In that sense, we could say that Git "prefers" LF-only line endings. (If you want something fancier, you can write clean and smudge filters, which also operate on data "on the way in" and "on the way out" respectively, and here you can do whatever you like. But the built in stuff inside Git is limited to these few CRLF options.)
There's one more tricky bit: Git tries hard to optimize not making copies, in or out, of the index and working tree. This attempt usually works right, but it fails (by not making copies when it should make copies) if and when you switch around whether and how Git should mess with line endings. The tricks you linked to, where you rm .git/index for instance, are mostly ways to get around this. This forces Git to copy data, even in cases where Git thinks it doesn't need to copy data, even though the changed status of a file (from -text to text, or eol=lf to eol=crlf, or whatever) means that Git does have to copy.
This is all that you need to memorize. The remaining details can be worked out.
Consequences
Suppose you have a repository in which, in every commit that has text files, all committed copies have LF-only line endings. Since this is, in effect, Git's "preferred" format, the files are already all "OK". If you choose to have Git mess with files, all future commits will have LF-only line endings too, and the future commits will match the existing commits.
But suppose you have a repository in which some or all text files are committed with CRLF line endings. These commits are frozen for all time! You literally cannot change them. They will continue to have CRLF line endings. If you now begin choosing to have Git mess with files, future commits will gradually, or suddenly all at once, have some or all files with LF-only line endings, as stored in the repository.
Regardless of which of the above statements about the existing repository are true, your settings, should you set them, will affect how you see the files in your working tree, because to get into your working tree, Git has to extract the files from commits. But your file viewers might not show you what the ends of lines look like. That is, if your preferred file viewer displays a CRLF line and an LF-only line as identical, they'll look identical, even when they aren't.
The fact that the ends of lines "change" can make a change that Git considers a change. If the existing commits in the repository have CRLF line endings, and you start having Git mess with line endings, it's a good idea to do one "normalizing" commit. You will become the owner of every line of every file that is changed this way but git blame, at least, has a way to "skip over" a specific commit, if you need to figure out where some code came from. Since this "fix all files, but no real changes" commit doesn't do anything except normalize these lines, you can tell git blame to skip over it.
Note that Git (and git diff) do consider these lines different, unless you tell git diff to ignore certain white space changes:
--ignore-cr-at-eol: Ignore carriage-return at the end of line when doing a comparison.
-w, --ignore-all-space: Ignore whitespace when comparing lines.
(There are others; this is just a partial list.)
Other items that should be mentioned here
When Git commits a file, it stores both the file's data and its "mode". Git has two modes for files, which it calls 100644 and 100755 when it shows them, but for which git update-index has a --chmod option that it spells -x and +x respectively. This tells Git that on a Unix-like system or any other system that has an equivalent, the 100755 or +x file should be marked executable at checkout.
Most Windows file systems currently don't have an equivalent. In this case, Git tries to retain the chmod setting from the existing checkout. The rm .git/index trick defeats this "retain the old setting" trick. So it's possible to change the mode of files when fixing end-of-line issues. This is why it's better to use git add --renormalize after changing CRLF line endings settings, if your Git supports this.
The general idea that there are some changes, or features of files, that are invisible or hard to see is a little weird, but we have non-computing examples: for instance, in fine typesetting, we have the hyphen (-), the en-dash (–), and the em-dash (—). These may or may not display on your computer as different width dashes. We have other computer examples, such as the Whitespace programming language or the terrible mistake with makefile syntax (where tabs are significant). And, in spycraft—whether or not we use computers—we have steganography.

Store several different .git branches in the same directory?

There was no local .git repository on Windows in a "..Downloads/Training" folder, using per-installed git bash, I first typed
git init
touch .gitignore in acccordance with the youtube tutorial (see link listed below) copied .gitignore file contents from previously created by visual studio repository, also added .gitignore by
git add .gitignore , also added the only subfolder M01 in "..Downloads/Training" by typing
git add ., commited the changes by
git commit -am "First commit, added the remote by typing
git remote add origin https:name_of_remote.com/my_repository_folder, created a branch
git branch M01, switched to the branch by
git switch M01, also pushed the repo by typing
git push origin HEAD:M01.
And the repository has been pushed successfully into that remote but now there is a problem:
I need to store the contents of each folder inside "..Downloads/Training" in a separate branch on the remote.
So if I create a new local folder M02 and a branch by typing git branch M02, switching to it by git switch M02, It shows me all of the contents that I have previously added into the M01 branch in the M02 branch, but If I remove the files from M02 by typing git rm . -r (it deletes local files), it also deletes the files from both M01 branch and M02 branch.
Is there a way to store only M01 local folder in a M01 branch and the M02 local folder in a M02 branch?
Additional source tutorial links:
(https://www.youtube.com/watch?v=g4BJXfmAevA)
You have learned some things that are wrong. Note: I have not watched the particular youtube tutorial you linked, so I'm not sure if it is good, bad, or indifferent. I base this paragraph solely on what you wrote in your question.
First, let's address what a Git repository is and is not:
A Git repository is a collection of commits. These commits are to be found using branch names, tag names, and other names, but it's not a collection of branches. It's a collection of commits. A loose analogy might be a collection of insects, where you label the insects: that makes the collection contain labels, but it's not a collection of labels. That is, the labels aren't the purpose of the collection.
The actual repository proper is stored in the .git directory (or folder, if you prefer that term). The existence of this folder, with specific files and sub-folders that Git will need, is what tells Git that this is a repository. If the .git directory itself is absent, or is missing these crucial files and/or sub-folders, Git will say that this is not a Git repository.
A bare repository starts and ends with the .git folder (which is then often named repo.git or some such, rather than just .git). Technically a bare repository still has an index / staging-area, but that's just because Git's implementation of the index / staging-area is mainly a file in the .git directory, named index. (This file does not have to exist: Git will create it if needed.)
A non-bare repository, which is the kind you would normally work with, also has a working tree. The working tree is where you see and work with your files. These files are ordinary, everyday files on your computer, stored in folders the way your computer likes to store files. It's important to understand that these working tree files are not in the Git repository.
The repository—the .git folder—stores commits and other internal Git objects. These take the form of an object database, in which Git looks up objects by their hash IDs: large, random-looking numbers, expressed in hexadecimal. All of these objects are in fact read-only: once stored in the database, no object can ever be changed. This means no commit or file can ever be changed.
The repository also stores a separate database of names: branch names, tag names, remote-tracking names, and other names. Each of these stored names holds exactly one hash ID. For branch names, the stored hash ID is always that of a commit. (Tag names are allowed to store other internal-Git-object hash IDs, that you don't normally interact with as directly.) This serves as a good (and fast) way to find particular commits.
Finding, extracting, using, and creating new commits is much of what we do with a Git repository. Since the repository is so commit-centric—and what we do with the repository is definitely commit-centric—it's important to know what a commit is and does, and how Git extracts one:
Each commit stores a full snapshot of all of your files. The files inside the commits are not in the ordinary computer file form.1 Because they are stored as Git objects, they're all read-only. Not even Git can change them. And, because the commit itself is a Git object, it has one of those big ugly hash IDs, so that Git can look it up in the database of objects.
Each commit also stores some metadata: information about the commit itself, such as who made it and when. In this metadata, Git stores the hash ID of a previous commit. More precisely, most commits store the hash ID of one previous commit, but there are some exceptions to this rule.
Because each commit is read-only, but you normally need to read and write your files, you'll tell Git to extract some commit. When you do that, Git will copy the files out of the commit. To do so, Git will read out the internal-only, read-only data from the internally stored files (with their internally-stored names) and turn that into the ordinary file-in-folder setup that your computer uses. Those files will go into your working tree.
In other words, when you run git checkout or git switch, Git performs an optimized (and safety-checked) variant of the following two-step sequence:
First, Git removes all the (tracked) files from your working tree.
Then, Git replaces all the files in your working tree with those from the commit you just checked out.
This is why each commit stores every file: because switching from one commit to another will remove all the files from the commit you're leaving behind, and extract, instead, all the files from the commit you're moving to. The commits themselves are completely read-only, so no commit changes in this process, and no files are lost. Only your working tree is emptied and re-filled.
Note that this talks about switching from one commit to another. Switching from one branch to another doesn't necessarily switch commits. To understand this correctly, we should draw a picture of the commits.
1They are stored in a special, read-only, Git-only form, where they are compressed and—important for Git's internal operations—de-duplicated. This de-duplication step allows every commit to store every file, without taking any extra space. In fact, many different files—plus other internal Git objects—may be stored inside a single computer file (named with some big ugly hash ID followed by .pack), although sometimes file contents are stored as what Git calls loose objects. Either way, though, these files do not have ordinary file names, either.
Drawing pictures of commits in branches
Each commit has some big, ugly, random-looking hash ID. Rather than using these, let's use single uppercase letters to stand in for the hash IDs. And, most commits contain the hash ID of some previous commit. Rather than using that, let's draw an arrow, pointing from the later commit to the previous commit. We'll start with some commit whose hash ID is H:
<-H
The commit H points backwards to, we'll call commit G:
<-G <-H
G of course points to another, still-earlier commit, so we have to keep going:
... <-F <-G <-H
Eventually we'll have a chain of commits that points all the way back to the very first commit ever. Let's call that A here, and draw in all the commits—but I'll get a bit lazy now and use lines to connect them, even though the arrows really only point backwards. Git can't go in and adjust the earlier commit to point forwards, because the earlier commit is frozen for all time, once it's made. So we have:
A--B--C--D--E--F--G--H
Here, commit H is the last commit in the chain. From H we—or Git—can work our way all the way back to the first commit in the chain, commit A; there, everything stops, because there are no earlier commits.
To find commit H quickly, we give it a branch name, like main or master.2 To represent this, let's put in the name, with an arrow coming out of it, pointing to commit H:
...--G--H <-- main
If you now create a new name, such as M01, that name selects the same commit, by default. We now have two names for commit H:
...--G--H <-- main, M01
Your own Git software, working in your own repository—"your Git" for short—can only be on one of these two branches. To represent which one you're on, let's attach the special name HEAD to one of these two branch names:
...--G--H <-- main (HEAD), M01
If we now run:
git switch M01
Git will move HEAD over to the name M01:
...--G--H <-- main, M01 (HEAD)
We're still using commit H, because both names select the same commit. The set of files in your working tree will remain exactly the same, because we didn't change commits.
Now suppose we remove a bunch of files—perhaps an entire folder-full of files—from the working tree, and create some new files, and run git add . to add the removing-and-creating and then run git commit. When we make this new commit, Git will save all the files that are in Git's staging area at that time. The git add . updated the staging area to match our working tree.3 This will make a new commit, which will get some random-looking hash ID, but we'll just call it "commit I". Let's draw it in:
...--G--H
\
I
Commit I has, as its parent, earlier commit H. That's because we were using commit H to make commit I; we were, and still are, on branch M01 as git status will say here, and just a moment ago—before we ran git commit—the name M01 pointed to commit H. But now that commit I exists, the git commit command wrote I's hash ID into the current branch name, so what we have is this:
...--G--H <-- main
\
I <-- M01 (HEAD)
Commit I contains the files we told Git it should have; branch name M01 selects commit I; and HEAD tells us we're on branch M01. If we now run:
git switch main
Git will remove the files that go with commit I and extract the files that go with commit H. Our working tree will now match commit H and we will have:
...--G--H <-- main (HEAD)
\
I <-- M01
as our commit picture.
2Your own Git probably defaults to master while GitHub now default to main, which creates some issues later.
3I've skipped over some important details about .gitignore files, tracked files, untracked files, and how the index works here, so as to concentrate on the commits-and-branches.
What you've gotten wrong
So if I create a new local folder M02 and a branch by typing git branch M02, switching to it by git switch M02, It shows me all of the contents that I have previously added into the M01 branch in the M02 branch ...
The new branch and the old branch currently share the same final commit.
What you see when you look at your working tree are just your working tree files. They're not the same as the committed files, although if git status says that everything matches, they do match the committed files. The committed files do not—cannot—change, ever.
Note that Git stores only files. The files in the commit at the end of your M01 branch may have names like M01/somefile. That's the file's actual name: M01/somefile. It's not a folder named M01 containing a file named somefile. Your working tree has this setup—folders with files in them—and in your working tree, the slashes might even go the other way, M01\somefile. In the commit, though, it's just a file named M01/somefile. This particular point is usually not important, but it means Git is literally incapable of storing an empty folder (because an empty folder contains no files, and Git can only store files).4
but If a remove the files from M02 by typing git rm . -r (it deletes local files), it also deletes the files from both M01 branch and M02 branch.
The git rm -r . operation clears out Git's index / staging-area and your working tree. The existing commits do not—and cannot—change. A future commit that you make right now will contain no files (because you removed all the files); as you put some files back and git add them, the future commit will contain those files.
Git's index / staging-area thus acts as your proposed future commit. You create a file in your working tree and run git add to copy it into Git's staging area. You git checkout or git switch to some existing commit to tell Git: remove all the current committed files and swap in the files from the other commit. This is completely safe if you've committed everything; it's not safe if you have uncommitted work, and git checkout or git switch will normally detect these not-safe cases and avoid clobbering the uncommitted work.5
4There are some tricks with Git submodules that can be used here, but let's not get into those.
5Note that git checkout has a "dangerous" mode of operation that will clobber unsaved work. The new git switch command implements the safe part of git checkout, while the new git restore command implements the unsafe part of git checkout. So it's probably better to learn the new git switch and git restore, so that you always know if you're running a safety-checking command.
For various reasons, both "safe" commands will sometimes let you switch branches even with unsaved work. When they will let you switch, but you forgot to save (add-and/or-commit) first, you can just switch back. This gets quite complicated when one goes into full detail. See Checkout another branch when there are uncommitted changes on the current branch if you really want to know.
Some further detail
ElpieKay's answer shows you how to create more than one root commit. A root commit is a commit like our commit A above: the very first one in a new, empty repository, that has no parent commit. Obviously that first commit can't have a previous commit, so Git has to be able to create a root commit. Using git checkout --orphan or git switch --orphan is another way to create a root commit.
There is a small but critical difference between checkout and switch here:
git checkout --orphan newbranch
leaves Git's index / staging-area full of files (and does not touch your working tree at all), but:
git switch --orphan newbranch
empties out Git's index / staging-area (and as a result, removes any corresponding tracked files from your working tree). If you intend to remove all tracked files, using git switch --orphan saves you this step.
In your case, a good solution is to create an empty root commit from the very beginning. And whenever you want to start a new branch from scratch, just create it from the root commit.
# create the 1st empty root commit
git init
git commit --allow-empty -m"root of all"
# tag it so that you don't have to use its sha1 value which is hard to memorize
git tag root
# create a new branch that tracks no files
git branch newbranch root
It's not late if you create an empty root commit now. Since your repository is not empty, we need other methods to create a root commit.
# create an orphan branch foo from the current branch
git checkout --orphan foo
# remove all cached files and directories
git rm --cached -rf .
# create the empty root commit
git commit --allow-empty -m "root of all"
# tag it
git tag root
# remove the orphan branch foo
git branch -D foo
And here's a 3rd method to create an empty root commit and tag it,
git tag root $(git commit-tree -m"root of all" $(git mktree < /dev/null))

Git - rename files with illegal characters

Our team works on repository which includes one directory where there are files with pipe character "|". I'm the only one on Windows so pipe character is illegal for files names.
Is there a way to rename files in the directory "extra" from ex. "2021|08|05" to "2021\08\05" when I make "git pull"?
Is there a way to rename files in the directory "extra" from ex. "2021\08\05" to "2021|08|05" when I make "git push"?
No.
No.
These are the right answers to the question you've asked, but the trick is, you've asked the wrong question. Alas, the answer to the right question is: "yes, but it's a horrible solution". The question is: Is there a solution to the problem of bad / invalid characters in file names stored in Git commits?
In the early days of Git, when it was a collection of shell scripts that only a few people could use successfully, 😀 one would obtain new commits from elsewhere with git fetch, then read these commits into Git's index with git read-tree.
Git's index, which Git also calls the staging area or sometimes the cache, can hold these file names. In fact, even on Windows, Git's index can hold files named aux.h, which Windows won't let you create. The index has no folders either: it just has files with names with embedded (forward) slashes, such as path/to/file. Git can hold two different files, one named README and one named readme, in its index. WIndows can't have two different files whose name only differs in case.
So, Git's index / staging-area can hold these files just fine. The problem comes when you go to work with the files. Files that are in Git's index, are stored there in a special Git-only format, as what Git calls a blob object. You cannot read or write a blob object directly. You have to use yet more Git commands to do this. It's terribly inconvenient.
To use Git conveniently, then, we normally don't use all the individual one-step-at-a-time internal Git operations: we use some sort of higher level, user oriented command, like git checkout. We check out an entire commit: Git will find all the files that are stored in that commit, read them into Git's index, and copy out and expand all the internal Git-only blob objects into ordinary files, with ordinary file names.
This step—copying files out of Git's index, to make them usable—is where things go wrong, on Windows. The file's name in Git's index is, say, path/to/2021|08|05. Git recognizes that path/to/ has to be turned into two folders, path\ and path\to\, on Windows, so that Git can create a a file in the second folder. Unfortunately, Git has no way to remap the 2021|08|05 part. That part is going to stay 2021|08|05, and as you have seen, Git can't create a file with that name: the OS just says "no".
What you can do, at this point, is drop down to those lower-level commands. You can run:
git rev-parse :path/to/2021|08|05
perhaps with quotes if needed, depending on your shell:
git rev-parse ":path/to/2021|08|05"
This git rev-parse command will show the blob hash ID for the file. You can then access the file's contents with:
git cat-file -p <hash>
which prints those contents to the standard output. If your shell supports redirection, you can then redirect the output to a file whose name is your choice. This lets you see and use the file's contents.
The git cat-file -p command can take the index path name directly, so:
git cat-file -p ":path/to/2021|08|05" > path/to/2021-08-05
is a way to extract the file to a usable name.
Unfortunately, git add—which is how you would normally update the file—will insist on using the name you gave the file in the file system. Once again, you must fall back on internal Git plumbing commands to work around this. If you need to update that particular file, for instance, you would:
run git hash-object -w -t blob path/to/2021-08-05 to turn the updated file's data into an internal Git object;
run git update-index with arguments to make Git update the entry for path/to/2021|08|05 using the hash ID obtained in step 1.
Once all of this is done, you can go back to normal Git commands, because git commit makes a new commit from what's in Git's index / staging-area.
The (rather large) drawback here is that you cannot use a lot of normal everyday Git commands:
git pull is often a no-go because it runs git rebase or git merge, both of which need to use your working tree (OS-level files). Run git fetch first, then do as much manual work as needed.
git checkout will fail: you can use it, but then you must manually do something about each of the bad file names that are now in Git's index.
git diff will show differences that include deleting the files with the bad names, and git status will show the adjusted-name files as untracked files (because they are).
git add of any changes you need to make to these files is also a no-go; use git hash-object -w and git update-index instead.
git rebase and git merge become difficult. You probably can deal with them as in steps 2 and 4, but that's painful at best.

Git: Reconcile Two Folders with Duplicate Name (Cased Different) in Windows While Preserving History

I recently discovered that there are a couple folders in my solution that have two distinct paths in Git (GitHub shows two separate folders), one being FooBar and the other being Foobar. This is because some files were registered with the former folder name as their path, and some with the latter.
This was discovered locally (in Windows) by configuring Git to not ignore case: git config core.ignorecase false
I took a stab at fixing this by deleting the whole folder, committing, then re-adding the folder and committing again. This fixed the problem, but the files that got their paths changed lost their Git History. Running gitk against the new path for these files showed just the one commit. Running gitk against their old path revealed their whole history.
Next stab: Use git mv to move the file:
git mv Foobar/file.txt FooBar/file.txt
This yields the error:
fatal: destination exists, source=Foobar/file.txt, destination=FooBar/file.txt
And if I try deleting the file first, of course Git complains that the source file doesn't exist.
Then I discovered Git doesn't complain about the destination already existing if you add -f to the mv command. However, after committing that rename, gitk shows that the history got severed anyway!
I even attempted to do the three step dance described here but this was just another way of doing the -f. Same result.
Basically I just want to move a file from Foobar/file.txt to FooBar/file.txt in a case-insensitive operating system in some way, while preserving Git history. Is this possible?
There is no simple solution to the real problem.
In Git, files don't have history. Commits have history—or more precisely, commits are the history. That is all the history there is. For Git to "follow" a file, as in git log --follow <path>, Git looks at the commits, one at a time, comparing each commit to its parent commit.
If a diff between parent and child shows that the parent contains a file named parent/path/to/pfile and the child contains a file named child/path/to/cfile and the content of these two files, in these two commits, is "sufficiently similar" (several conditions must hold here), then, in Git's "eyes", that parent-to-child transition represents a rename of that file. So at that point, git log --follow, which had been looking for child/path/to/cfile, starts looking instead for parent/path/to/pfile.
Without --follow, git log does not do this special "find a rename" operation ... and in general, Git believes that any path names with any byte-level difference represent different files. In other words, case-folding and UTF-8 normalization do not occur. Consider, e.g., the word schön, which can be represented as either s c h ö n or s c h o combining-¨ n. We can, on a Linux box, create two different files using these two different UTF-8 style names. Running ls will show two files whose name appears the same:
$ cat umlaut.py
import os
p1 = u'sch\N{latin small letter o with diaeresis}n'
p2 = u'scho\N{combining diaeresis}n'
os.close(os.open(p1.encode('utf8'), os.O_CREAT, 0o666))
os.close(os.open(p2.encode('utf8'), os.O_CREAT, 0o666))
$ python umlaut.py
$ ls
schön schön umlaut.py
Git is perfectly happy to store both files, separately. However, MacOS refuses to allow both files to coexist, in the same way that Windows—and for that matter, MacOS by default as well—refuses to allow both Foobar and FooBar to coexist.
Make Git store the file in new commits under the new byte-sequence, and history is preserved, it's just not the history you want preserved. But the history that's already in the repository is already not the history you want preserved.
In practice, you should probably just rename the file in Git's eyes—which has no effect on the file's name in your OS's eyes; FooBar and Foobar are the same name here—and get on with things. Your alternative is to rewrite all history going back in time to the point at which the bad pairings were first added to the repository, by copying (with slight modifications) each "bad" commit to a new-and-improved "good" commit. But this then means getting everyone who uses the repo to switch from "bad old repo" to "new and improved good repo".

.gitattributes don't work properly on mac and windows

On my project i use computers with different OS, one is Mac second is with windows. When I use git every change is shown as whole document change. The reason is different end-of-line in these two OS. I read this https://help.github.com/articles/dealing-with-line-endings/ and made a .gitattributes file in the root folder but the problem still exists. This is my .gitattributes file:
# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto
# Explicitly declare text files you want to always be normalized and converted
# to native line endings on checkout.
*.css text
*.html text
*.js text
# Declare files that will always have CRLF line endings on checkout.
*.sln text eol=crlf
# Denote all files that are truly binary and should not be modified.
*.png binary
*.jpg binary
I have no idea why it's not working because I was try a lot of configurations of this file before.
The .gitattributes file should be added with the first commit. If you add it a few commits in, you need to normalize all the existing files explicitly.
$ rm .git/index # Remove the index to force Git to
$ git reset # re-scan the working directory
$ git status # Show files that will be normalized
$ git add -u
$ git commit -m "Introduce end-of-line normalization"
See https://git-scm.com/docs/gitattributes
If .gitattributes file was not added with the first commit, the following should be performed to apply attributes locally:
Go to the root of the repository
Check status:
git status
If it says "nothing to commit, working tree clean", perform:
git rm --cached -r .
git reset --hard
The answer is based on https://dev.to/deadlybyte/please-add-gitattributes-to-your-git-repository-1jld

Resources