This is the umpteenth version of the extremely basic question "why the heck is Git telling me that files changed but diff shows no changes?". Similar questions have been posted here and here but none of those answers help.
My scenario is as follows:
I added a .gitattributes file to an existing Git repo with several already existing commits in it. The content of the .gitattributes file looks as follows:
* text=auto
*.bat text eol=crlf
*.cmd text eol=crlf
*.ps1 text eol=crlf
*.sh text eol=lf
*.csproj text eol=crlf
*.filters text eol=crlf
*.props text eol=crlf
*.sqlproj text eol=crlf
*.sln text eol=crlf
*.vcxitems text eol=crlf
*.vcxproj text eol=crlf
*.cs text
*.config text
*.jmx text
*.json text
*.sql text
*.tt text
*.ttinclude text
*.wxi text
*.wxl text
*.wxs text
*.xaml text
*.xml text
*.bmp binary
*.gif binary
*.ico binary
*.jpg binary
*.pdf binary
*.png binary
After adding that file I executed the following command:
git rm --cached -r .
git reset --hard
The result is that Git git status now shows most of the files in the Git repo as modified. However, I cannot see any changes in any of those files. The diff tool isn't showing any changes, neither in the text view nor in its hex view.
The repo has been created on a Windows machine and I'm currently using it on a Windows machine. The output of the command git config --list is as follows:
http.sslbackend=schannel
diff.astextplain.textconv=astextplain
credential.helper=manager-core
core.autocrlf=true
core.fscache=true
core.symlinks=false
core.editor="C:\\Program Files\\Notepad++\\notepad++.exe" -multiInst -notabbar -nosession -noPlugin
pull.rebase=false
credential.https://dev.azure.com.usehttppath=true
init.defaultbranch=master
user.name=My Name
user.email=my#email.whatever
core.autocrlf=true
core.eol=crlf
diff.tool=bc
difftool.bc.path=C:/Program Files/Beyond Compare 4/bcomp.exe
difftool.bc.cmd="C:/Program Files/Beyond Compare 4/bcomp.exe" "$LOCAL" "$REMOTE"
difftool.bc.prompt=false
merge.tool=bc
mergetool.bc.path=C:/Program Files/Beyond Compare 4/bcomp.exe
mergetool.bc.cmd="C:/Program Files/Beyond Compare 4/bcomp.exe" "$LOCAL" "$REMOTE" "$BASE" "$MERGED"
mergetool.bc.keepbackup=false
mergetool.bc.trustexitcode=true
core.repositoryformatversion=0
core.filemode=false
core.bare=false
core.logallrefupdates=true
core.symlinks=false
core.ignorecase=true
So the magic switches core.autocrlf and core.eol are as they should be for Windows as far as I could decrypt from the documentation.
Does anyone have a clue what Git landmine I've stepped on here?
There are multiple possibilities here, but the most common by far has to do with these CRLF line endings. It's complicated, and to really get it, we need some background first.
From a high level point of view, Git basically has two options:
Don't mess with line endings ever.
Do mess with line endings.
The first one is really simple, and is the default on all Unix-like systems. It's probably the default on Windows too, but I don't use Windows, so I'd have to defer to anyone else who says otherwise. In this setup, if you create a file and store, in that file, the byte-sequence:
h e l l o CTRL-M CTRL-J w o r l d CTRL-M CTRL-J
and then git add the file and run git commit, Git will store, in the repository, a new commit in which that file contains those 14 bytes. The blob hash ID will be:
$ printf 'blob 14\0hello\r\nworld\r\n' | shasum
23eb407b644b0e362fa224168ecd0adfa02b022a
This file has CRLF line endings. Extracting the commit will produce a file with CRLF line endings. The file in the repository is now read-only, frozen for all time; it has blob hash ID 23eb407b644b0e362fa224168ecd0adfa02b022a, as does every file in any Git repository anywhere in the universe, as long as that file contains exactly that text.
Now suppose, having created this file (or not), we turn on the "do mess with line endings" options. We now get numerous sub-options, specifying just how Git will go about messing with line endings, when, on which files. These include eol=crlf, eol=lf, text, binary, and so on:
*.bat text eol=crlf
*.sh text eol=lf
*.jpg binary
This fragment tells Git that if the file's name ends with .bat, Git should mess with line endings in one particular way; if it ends with .sh, Git should mess with line endings in another particular way; and if it ends with .jpg, Git should not mess with line endings.
We know that the binary specification means that for such files, Git doesn't mess with line endings. This is good since, for instance, .jpg files do not actually have lines in the first place, so that anything that resembles a line ending is just coincidence. When Git isn't messing with anything, it's all easy: Git is storing what's there and showing you what's stored.
But that's no longer true for the other files. Since Git is now messing with their line endings, it becomes important to ask and answer more questions:
When exactly does Git mess with the line endings?
What exactly does Git do when it does this messing-about?
This is where things get complicated. The key to understanding things here is to know about Git's index. This thing—this "index"—is central in Git and you really do have to know about it to use Git properly, so let's take a tour of the index.
Git's index
Git's index is either so important or so poorly named (or both) that it actually has three names. It is also called the staging area, which refers to how you normally use it, and it is sometimes called the cache. This last name is pretty rare these days: you mostly see it in flags like git rm --cached. (Some commands, like git diff, have both --staged and --cached, with the same meaning. For some reason no one has gotten around to adding git rm --staged yet. I thought that would have happened by now, and I still think it will happen someday.)
The index does a bunch of things for Git, but here we really care about what it does for—and to—you. What it does for you is hold your proposed next commit. Git is, fundamentally, not about files, but rather about commits. Each commit holds files: in fact, each commit has a full snapshot of every file. (Each commit also has some metadata, such as the name and email address of the commit's author, but we'll skip that here.)
The thing about commits, though, is that they're purely read-only. You can make new ones, but you can never change any existing commit. The git commit --amend command, for instance, fakes it: it does not change the existing commit, it makes a new one and stops using the old one in favor of the new one instead. When you can't tell the difference—and sometimes you can't—this is just as good. When you can tell the difference—and sometimes you can—the cracks show through.
But if you can't change a commit—and you can't—and if, as is also true, the files inside a commit are in a special, compressed, de-duplicated, Git-only form that no programs other than Git itself can even read in the first place, how can you use the files that are inside a commit? The answer is simple enough: In order to use a commit, you have to have Git extract that commit first. We run git checkout or git switch to achieve this. Git extracts the files from the commit, placing usable version of them in our working tree or work-tree, where we can see them and get our work done.
Git could stop here, with committed files—read-only inside the current commit, frozen for all time—and working files. Other version control systems do stop here. But Git doesn't. Instead, as it's extracting the commit, Git puts "copies" of each file into Git's index.
I put "copies" in quotes here because the files in Git's index are stored in the internal, compressed, de-duplicated format. Since they were just extracted from some commit, they take no space: they're de-duplicated away. They hold the same data in the index that they hold when they're inside the commit: this data is frozen for all time.
What's special about the index "copies" of files is that, unlike the committed copies, you can replace them. The git add command tells Git: compress and de-duplicate the working tree file. Git reads the working tree copy, compresses it, and checks to see if the compressed result is a duplicate of some existing file in any existing commit. (This is where that blob hash ID trick comes in: it's why any file consisting entirely of hello\r\nworld\r\n has hash ID 23eb407b644b0e362fa224168ecd0adfa02b022a.) If this is a duplicate, Git puts the duplicate's hash ID in the index. If it's not a duplicate, Git arranges to store a new blob in the object database,1 and stores the new blob's hash ID in the index.
Either way, after this update-the-index step, the proposed next commit is now updated. The file you git add-ed is now staged, and git status will compare the staged hash ID to the current-commit hash ID and say staged for commit if these hash ID's don't match. (This means that git add-ing a file that's been turned back to match the committed copy takes away the staged for commit message, even though the file will in fact be in the next commit. It's just that the hash IDs now match!)
So, Git's index holds this proposed next commit. To make a new commit, you:
futz with the files in your working tree;
run git add on them to copy them back into Git's index; and
run git commit to package up whatever is in Git's index right then.
This is why you have to keep git adding a file each time you change it: Git doesn't automatically copy the working tree file back into the index. Git only copies it back when you say to do that.2
The end effect—and what you should take into the next section—is that, at all times, Git has three copies of each file:
HEAD index work-tree
--------- --------- ---------
README.md README.md README.md
img.jpg img.jpg img.jpg
main.py main.py main.py
for instance. The work-tree version is the one you can see, read, write, feed to a JPG viewer, run with the Python program, and so on. The other two are for Git: the HEAD version is the frozen-for-all-time copy from the current commit and the index version is the malleable-but-frozen-format copy, ready to go into the next commit.
The git checkout or git switch command switches to some commit, copying the files out of the commit to Git's index and then to your working tree.
The git restore command reads a file from somewhere—a commit or the index—and writes it to the index and/or your working tree based on the -S (write to staging) and -W (write to work-tree) options.
The git reset -- file command reads a file from Git's index and writes it to your working tree. (The -- here is a precaution, in case the name of the file is, say, master or dev or something that resembles a branch name).
The git add file command reads a file from your working tree and writes it to the index.
(Lots of alternatives are not listed here.)
So all these various commands are tricks for manipulating the index and/or working tree copy, in preparation for making the next commit (since Git is mostly about making new commits, while keeping all the old ones).
1Git actually stores the new compressed blob object immediately, even if it winds up being replaced before you make a new commit. This is okay (if perhaps sub-optimal in certain peculiar situations) because Git will run git gc for you now and then. Certain older Git versions had a bug where git gc didn't get run often enough, and this could actually be a problem, but that's been fixed for years now.
2Using git add -u tells Git to find modified working tree files, and add them, which automates the job. Using git commit -a is a lot like running git add -u && git commit: it runs a git add -u step before the commit. However, -a complicates things a bunch, and interacts badly with poorly-written pre-commit hooks, so it's kind of a bad idea. Try not to rely on it: use git add -u instead, in case you have one of these bad commit hooks. Or, learn to love the index, which lets you play clever tricks like git add -p, although this too interacts badly with poorly-written pre-commit hooks.
How and when Git messes with line endings
If:
Git is told to mess with line endings, and
a file is marked text, so that Git will mess with this file, or the text=auto setting is being used and Git guesses that this file is text
then:
Git will optionally mess with the file's bytes on the way from index to working tree (checkout or switch, restore, various kinds of reset, etc), and
Git will mess with the file's bytes on the way from working tree to index (add, mostly).
What messing-about will Git do? That depends on the eol= setting:
eol=crlf: On the way out, Git will change LF-only to CRLF. If a line reads hello\n in the index, Git will write hello\r\n to the working tree copy. On the way in, Git will change CRLF to LF-only. If a line reads hello\r\n in the working tree copy, Git will write hello\n to the index copy.
eol=lf: On the way out, Git will do nothing to the file. On the way in, Git will change CRLF to LF-only.
That's it—that's all Git will do! It won't ever change LF to CRLF on the way in, for instance. In that sense, we could say that Git "prefers" LF-only line endings. (If you want something fancier, you can write clean and smudge filters, which also operate on data "on the way in" and "on the way out" respectively, and here you can do whatever you like. But the built in stuff inside Git is limited to these few CRLF options.)
There's one more tricky bit: Git tries hard to optimize not making copies, in or out, of the index and working tree. This attempt usually works right, but it fails (by not making copies when it should make copies) if and when you switch around whether and how Git should mess with line endings. The tricks you linked to, where you rm .git/index for instance, are mostly ways to get around this. This forces Git to copy data, even in cases where Git thinks it doesn't need to copy data, even though the changed status of a file (from -text to text, or eol=lf to eol=crlf, or whatever) means that Git does have to copy.
This is all that you need to memorize. The remaining details can be worked out.
Consequences
Suppose you have a repository in which, in every commit that has text files, all committed copies have LF-only line endings. Since this is, in effect, Git's "preferred" format, the files are already all "OK". If you choose to have Git mess with files, all future commits will have LF-only line endings too, and the future commits will match the existing commits.
But suppose you have a repository in which some or all text files are committed with CRLF line endings. These commits are frozen for all time! You literally cannot change them. They will continue to have CRLF line endings. If you now begin choosing to have Git mess with files, future commits will gradually, or suddenly all at once, have some or all files with LF-only line endings, as stored in the repository.
Regardless of which of the above statements about the existing repository are true, your settings, should you set them, will affect how you see the files in your working tree, because to get into your working tree, Git has to extract the files from commits. But your file viewers might not show you what the ends of lines look like. That is, if your preferred file viewer displays a CRLF line and an LF-only line as identical, they'll look identical, even when they aren't.
The fact that the ends of lines "change" can make a change that Git considers a change. If the existing commits in the repository have CRLF line endings, and you start having Git mess with line endings, it's a good idea to do one "normalizing" commit. You will become the owner of every line of every file that is changed this way but git blame, at least, has a way to "skip over" a specific commit, if you need to figure out where some code came from. Since this "fix all files, but no real changes" commit doesn't do anything except normalize these lines, you can tell git blame to skip over it.
Note that Git (and git diff) do consider these lines different, unless you tell git diff to ignore certain white space changes:
--ignore-cr-at-eol: Ignore carriage-return at the end of line when doing a comparison.
-w, --ignore-all-space: Ignore whitespace when comparing lines.
(There are others; this is just a partial list.)
Other items that should be mentioned here
When Git commits a file, it stores both the file's data and its "mode". Git has two modes for files, which it calls 100644 and 100755 when it shows them, but for which git update-index has a --chmod option that it spells -x and +x respectively. This tells Git that on a Unix-like system or any other system that has an equivalent, the 100755 or +x file should be marked executable at checkout.
Most Windows file systems currently don't have an equivalent. In this case, Git tries to retain the chmod setting from the existing checkout. The rm .git/index trick defeats this "retain the old setting" trick. So it's possible to change the mode of files when fixing end-of-line issues. This is why it's better to use git add --renormalize after changing CRLF line endings settings, if your Git supports this.
The general idea that there are some changes, or features of files, that are invisible or hard to see is a little weird, but we have non-computing examples: for instance, in fine typesetting, we have the hyphen (-), the en-dash (–), and the em-dash (—). These may or may not display on your computer as different width dashes. We have other computer examples, such as the Whitespace programming language or the terrible mistake with makefile syntax (where tabs are significant). And, in spycraft—whether or not we use computers—we have steganography.
PSR-12 for PHP and Airbnb's ESLint config for React requires LF line endings over CRLF. I see in the ESLint docs that they recommend adding a .gitattributes file with content similar to this:
*.js text eol=lf
I checked the Git documentation and it mentions that using eol can make paths be considered dirty. What is meant by this? I also notice there's mentions of core.safecrlf later in the docs, so can these types of conversions cause irreversible problems?
Will I also need to set core.autocrlf to false so that .gitattributes takes effect?
When you set a file in text mode in Git, that tells Git that you want it to perform end-of-line conversion. That means that Git will always write the file into its internal storage with LF endings, and then check out the endings that are specified, whether that's due to .gitattributes or various configuration options.
If, however, the repository already contains files with CRLF line endings checked in, then setting the eol option will cause the file to be checked into the repository with LF endings, as mentioned above. This will make Git think the file is modified, which it is. That's what's meant by making "the paths to be considered dirty."
The easiest way to solve this problem is to add the entries to .gitattributes, add the .gitattributes file, and then run git add --renormalize . and then commit. That way, any files which had CRLF endings will be converted to LF endings in the repository.
You don't also need to set core.autocrlf in addition. That behavior is overridden by the .gitattributes file.
Let me clarify: I want git to not care about whether line endings are CRLF or LF on checkin/commit. I understand there is no way at the moment to make git not care if a file has mixed line endings, although I would love a workaround to this, just in case; I just want it not to care whether all line endings in a file are CRLF or LF.
I recently set many file extensions in my system .gitattributes file, /etc/gitattributes (using MSysGit), to tell git which extensions are usually text or binary. For most of the files I want git to think are text, I set the extension
*.extension text=auto
because this will tell git that files with these extensions should have the general system line endings. Now I am regretting that decision, as I am seeing how many files are, for one reason or another, automatically given LF line endings instead of CRLF. Now, after tinkering with this and other settings, I have been getting errors similar to
$ git add -A && git commit -m "signup/in/out now possible through passport"
fatal: LF would be replaced by CRLF in node_modules/mongoose/node_modules/ms/package.json
on a lot of files I try to check in. In this case, it seems to be npm that's causing these files to be created as LF instead of CRLF, but I'm sure there are many other causes.
To be honest, I personally don't care which type of line endings a specific file has, as long as I can read and edit these files in my editing tool(s) of choice, as the vast majority of the time the line endings don't have any special function besides being, well, line endings. If it really matters, I can always do a quick conversion with unix2dos or dos2unix. However, git is notoriously finicky with line endings, and I don't want it to accidentally mark a text file as binary or vice versa, hence why I have been changing all these defaults.
How do I make git check in all text files as LF-line-ended files, and check them out as CRLF, but not care whether they have CRLF or LF endings in my actual working tree? Alternatively, is there a way to have git convert all the text files with LF endings to CRLF in my working tree as well, instead of giving the warning and giving up?
EDIT It seems my issue was not with my gitattributes files, but with my core.safecrlf setting in my gitconfig.
My issue seems to be with another config setting I set in git, core.safecrlf. According to the accepted answer to this question, which clarified things in several blog posts on the subject, this setting checks to see if the files git is checking in or out will have their line endings changed. If it determines they will be changed, it aborts the operation. I didn't understand this setting before, but now that I've played around a bit, I think I do understand it.
From what I can tell by the playing around, it seems that this setting is only useful for binary files with extensions not specified in your gitattributes, as well as files where line endings actually have a meaning in the language you will be using to edit them. As an example, let's assume all files in this language have the extension .ext. If this language uses the symbol computers use to denote LF and/or the one they use to denote CRLF, git shouldn't convert these .ext files. I don't know of any languages like that, but if they exist, and if the programmer still wants git to interpret files written in these languages as text, the programmer should have a special attribute set in his/her gitattributes, instead of having *.ext text.
Other than these 2 types of files, I can't think of any other situation to use core.safecrlf=true. Therefore, until I encounter a situation like these, I will be having this setting unset, or perhaps set to warn.
In your .gitattributes, just add this line. It will always convert all line ending to CRLF when checkout no matter what it was before.
* text eol=crlf
see this http://git-scm.com/docs/gitattributes
Set to string value "crlf"
This setting forces Git to normalize line endings for this file on checkin and convert them to CRLF when the file is checked out.
git config autocrlf <option>
It can have options:
1) true: x -> LF -> CRLF
2) input: x -> LF -> LF
3) false: x -> x -> x
I do development on Mac OS X. I have a user who is contributing code with CRLF line endings. He currently does not use git. I create a branch, then switch my working tree to it. I copy his file into the working tree. When I try to stage the file, I receive the error fatal: CRLF would be replaced by LF in pcb-gcode.ulp.
I've been through endless posts and tried suggestions (such as .gitattributes & git reset) and the only solution seems to be to use sfk or similar to change the line endings when I get the file from him.
Is there a way to have git change his CRLF line ends to LF when staging and committing, and use LF if I checkout the branch to my working tree? It seems that there would be an option to have git just recognize a line ending as a line ending and give me what is appropriate for my OS when I check it out.
git config --global -l (excerpt)
core.autocrlf=input
core.safecrlf=true
git config --local -l
(nothing relevant)
I'm using SourceTree and the remote repo is hosted on Assembla, in case that is pertinent.
I think, you can try new (1.7.2+) core.eol
Sets the line ending type to use in the working directory for files
that have the text property set. Alternatives are 'lf', 'crlf' and
'native', which uses the platform's native line ending. The default
value is native
And don't use buggy and not-obvious core.autocrlf
I have Windows machine with VS project and I use both Visual Studio and tools from Cygwin environment including Git. Sometimes I get different line endings in files after editing. I want simple solution to check files' line ending consistency before they go to the repo. Git's core.safecrlf is the right thing I suppose.
Now I have a strange behavior:
Files A and B with following parameters:
$file A
A: HTML document, UTF-8 Unicode text, with CRLF line terminators
$file B
B: HTML document, UTF-8 Unicode (with BOM) text, with CRLF line terminators
File A is already in repo, file B is new one. Note, both have CRLF line endings. Now try to stage them, core.safecrlf is true.
$git add A # ok
$git add B # fails
fatal: CRLF would be replaced by LF in B.
Am using core.safecrlf correctly? Or maybe I need to write hook to check files?
Notes:
tried with different file encodings (with and without BOM), no difference.
there's related core.autocrlf feature in Git, added it to tags (Stackoverflow has no tag for core.safecrlf)
git version 1.8.5.rc1.17.g0ecd94d (compiled from sources under Cygwin)
EDIT #1: checked out core.autocrlf - it was input. Changed to false, now I can add both files. Why?
According to your later edit core.autocrlf = input was the original setting. With this setting CRLF is converted to LF when checked in to the repository and is kept that way when checked out. This means a non Unix line endings aware editor like Notepad would mess up the appearance of a checked out version of the file. It would be one giant long line.
FWIW core.autocrlf = input is the preferred setting on Unix systems and using the default cygwin build probably set it that way. In Windows the recommended settings is core.autocrlf = true which is what msysgit recommends
core.safecrlf = true aborts the conversion of a file if checking out the file will result in a changed and possibly damaged file which would be the case if Notepad was the editor. This is why file B was aborted because it would be messed up in an editor like Notepad. The difference between core.SAFEcrlf and core.AUTOcrlf should be noted. It is one of the eyes glazing over issues in understanding git line endings
core.autocrlf = false is don't care mode. It will check in and check out the files exactly as they are, which is why the commits now work. This is not very smart and is not recommended because it causes problems if the files are edited on both Windows and Unix systems and also if another users core.autocrlf settings differ and change the file endings.
My own preference is to set core.autocrlf to input on Windows if all the Windows editors and other text file processing tools on the project are Unix line ending aware, otherwise set it to core.autocrlf = true for Windows and core.autocrlf = input for Unix. In any case this approach is outmoded by the superior method of the .gitattributes file.
The .gitattributes file method processes files based on the file name and is maintained in all users environments as it is checked out into the working directory like .gitignore. The settings for as many file names and types as are relevant to your project should be added to .gitattributes. Your configuration then falls back onto the local core.autocrlf and core.safecrlf settings if the file name is not matched in .gitattributes. Adding * text=auto at the top of .gitattributes will cause git to do a best guess on file names which are not matched by the later .gitattributes settings.
This web page, Mind the End of Your Line helped me understand the issue better. You might read for more background on the issue.
The CR LF line ending choices are not that easy to understand. There are two places for the descriptions in that it is covered both in Git-attributes and Git-config manuals.
Initially there were the autocrlf settings, and then there were the newer versions which have some potential incompatibilities (i.e. do unexpected things as you indicate).
I tend to set the eol=LF, which makes all text files be committed as LF line endings (you can set attributes as to which files are considered text) and then add the safecrlf for doing a round trip check.