Git rename detection when class and filename changed in one commit - visual-studio

What is the best way to handle class renames (e.g. done with Resharper) with Git?
That is, if both the class name and containing file name are changed together and committed without further changes.
It seems the way Git handles renames via a percentage changed heuristic is a bit hit and miss.
For large classes it will be recognised as a rename but for small classes the percentage threshold is reached such that it will be seen as a delete and add.

Keep in mind that in Git's history, file renames are not stored as "this was renamed from X to Y". Instead, the file X exists in one revision, and in the next revision Y exists (and X doesn't). For example:
Revision | Files
---------+----------------------------------
HEAD^ | a.cpp x.cpp z.cpp
HEAD | a.cpp y.cpp z.cpp
In the above diagram, each revision is a row and each contains three files. Between the two revisions, x.cpp was renamed to y.cpp. The only information that the repository stores is the contents of each separate revision.
When Git (or another tool that reads Git repositories) looks at the above history, it notices that y.cpp is a new file in HEAD. Then it looks at the previous revision to see whether a similar file existed. In the case of a straight file rename, then yes, a file called x.cpp with the identical contents existed in the previous revision (and no longer exists in the current revision). So the new file is shown as a rename from x.cpp to y.cpp.
In the case of a rename-and-modify, Git will look at the previous revision's files to see if one file looks close to the new file (in terms of its contents). This is where the heuristic comes in. If most of the lines are the same, then Git will show it as a rename, but if there are enough changed lines compared to unchanged lines, then Git will simply say it looks like a new file.
To answer your question, the best way to handle resharper class renames is to simply do it and commit the new files. Git stores the old and the new files in its repository. Rename detection is handled later, at the time you actually ask about the history. This is why commands such as git log have options like --find-copies and --find-copies-harder.

Years later this is still a quirk I guess because it's fundamental to the way git works.
What I do now is do the rename, commit, then do the change. Bit annoying with refactoring tools but no other solution retains history (--find-copies and --find-copies-harder don't seem to work).

Related

How do I not commit the development team lines in project.pbxproj without deselecting those lines manually?

I am collaborating with my friend on an iOS app. We use different Apple IDs in our Xcodes, so in "Signing and Capabilities" tab of project settings, we select different teams in the "Team" field:
From my observation, changing this affects the MyProject.xcodeproj/project.pbxproj file, which stores the file references that the Xcode project has, in addition to the "Team". Here's a snippet of what is changed:
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor;
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = <my team ID>; /* this is changed */
INFOPLIST_FILE = MyProject/Info.plist;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"#executable_path/Frameworks",
);
PRODUCT_BUNDLE_IDENTIFIER = io.github.sweeper777.MyApp;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 5.0;
TARGETED_DEVICE_FAMILY = 1;
};
The problem arises, when one of us commits this file and the other person pulls. The "puller" will now have the "Team" set to something invalid. When this person then tries to run the app on a real device, there will be code signing errors for obvious reasons. To solve this, this person must tediously go through all the targets that we have, and set each "Team" to their own team.
How can we make it so that on each person's computer, the "Team" stays the same after pulling, but any other changes to MyProject.xcodeproj/project.pbxproj is applied?
Remarks:
Putting the entire MyProject.xcodeproj/project.pbxproj in .gitignore doesn't work, because that would ignore every other change to it. Adding a new file to the project, for example, also changes MyProject.xcodeproj/project.pbxproj, and we want to be able to pull that change.
Manually deselecting the lines that say "DEVELOPMENT_TEAM = ..." when committing is as tedious as reselecting the correct team every time, so that's not a solution.
I found this. Apparently, I can configure git to run sed before git checkout and git add. However, that answer seems ignore the line by deleting it completely. This means that my friend, when he pulls, would still have to reselect the correct team. What I want is the kind of "ignore" that simply stops tracking that line. That is, if there is a local version of that line, use that.
I am also aware that this all wouldn't be a problem if we are on the same team. But if I understand this correctly, I can't have multiple people on my team unless I have a Company account, and not only can I not afford that, I don't own a Company.
I don't use Xcode itself and do not know how to smuggle Git hooks and scripts past the Xcode interface, so you'll need more than just this answer. But you mention sed in comments, and given your proposed file format, that may well be the way to go:
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor;
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = <my team ID>; /* this is changed */
INFOPLIST_FILE = MyProject/Info.plist;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"#executable_path/Frameworks",
);
PRODUCT_BUNDLE_IDENTIFIER = io.github.sweeper777.MyApp;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 5.0;
TARGETED_DEVICE_FAMILY = 1;
};
Git has the ability to run what it calls clean and smudge filters. These can be used to run any arbitrary program you like, including sed, the "stream editor", which is particularly good at making single-line changes based on regular expression matches.
There is another method that may also work, and may "play better" with Xcode, or may play worse. I'll go over that too, after covering clean and smudge filters.
Before we dive into writing clean and smudge filters, and using them from Git—you'll need to know all of these details as you will have to write your own custom filters—we should start with a simple fact about Git commits: No part of any commit can ever be changed. Once you make a commit, the stuff that's inside the commit—the stored data in all of its files—is the way it is, forever. So these filters have to work within that system. Remember that, as it will help with understanding what we're doing.
How Git makes and stores objects
The files inside a commit are not files, exactly: they're not the same thing as files in your file system, at least. Instead, they are what Git calls objects, specifically blob objects. A blob object holds the file's data; other objects hold the file's name; and commit objects collect everything together to be used all at once. There's one more internal object type for annotated tags but we'll stop here as we're really only interested in the blob-object part.
When Git extracts a commit, it reads the internal blob objects and runs them through internal code to decompress and format them into regular files. This can include doing end-of-line hacking (turning LFs into CRLFs) if desired. Normally, all this happens entirely inside Git, and the end result is that Git writes out an ordinary everyday file for you to use. This ordinary file is what you will work on / with, in Xcode or any other editor and compiler system and so on. These ordinary files are in your working tree.
After you've extracted some commit, you'll do some work on it, by changing some or all of the files in your working tree, to achieve whatever result you wanted. This can include changing the buildSettings, editing Swift code, editing Objectionable-C Objective-C code, and so on. You might add all-new files to the working tree, some of which you never commit at all (you can help make sure this never happens by listing such files in .gitignore).
Eventually, though, you'd like to commit the updated code. To do so, you must run git add, or maybe have your IDE run git add for you (perhaps Xcode has clicky buttons to do this). This invokes code in Git that converts the working tree file(s) back to internal blob objects if and as needed.
Again, normally this is all handled entirely inside of Git. Git will read the working tree file, maybe do CRLF-to-LF-only changes, compress the text, search for duplicate objects, and do all the other complicated things necessary to prepare the file, so that it is ready to be committed. The resulting data need not match what's in your working tree at all: it just has to be something that, when Git later goes to extract the file, produces what you will need in your working tree.
Clean and smudge filters
This is where these clean and smudge filters come in. I said, above, that normally, Git does the extraction and insertion all on its own. For binary files, the only thing Git does here is apply lossless compression.1 For text files, Git can do CRLF/LF substitutions as well. But what if you'd like to do your own operations?
You can: Git will let you do whatever you want during the extract process with a smudge filter, and will let you do whatever you want during the compress process with a clean filter. The clean filter replaces the in-file data, using a stream-edit type process,2 and then Git does its CRLF hacking if any and compressing on the "cleaned" data. The smudge filter replaces the decompressed, post-CRLF-hacking data coming out of Git with the data that should go into the working tree.
Hence you can write, as your clean filter, a sed script of the form:
s/DEVELOPMENT_TEAM = .*;/DEVELOPMENT_TEAM = DEVTEAMTEMPLATE;/
With that as the entire sed script, what sed will do is edit the incoming data stream and replace any actual development team text with the word DEVTEAMTEMPLATE.
Your smudge filter has to work slightly harder: it must find the template line and adjust it so that it contains the correct team ID. Where will you get the correct team ID? That's up to you: perhaps you can store it in a file in your home directory, or in a file that you create in the working tree but never commit in Git. You'll have to write this one or two or however-many-liner sed and/or shell script yourself.
1There are multiple phases of compression; git add does just one, and git checkout undoes all—including reading from "pack" files—as needed. The deeper level of compression, using delta encoding techniques, is entirely invisible at the "object" level, so nobody ever really has to think about it.
2With the advent of Git-LFS, Git gained the ability to run long-lived filters. Before that, Git always used simple stream filtering. The stream filtering is easier to understand, but is less efficient for doing en-masse operations on many files. Here, we're only interested in one file per repository anyway, so there's no need to go into the fancier long-lived filter details.
Defining clean and smudge filters
The tricky part here, with Git, is that you must define the filters in one place—in $HOME/.gitconfig or .git/config, for instance—and then tell Git to invoke them from another place, using the .gitattributes file. This is described in the gitattributes documentation. This documentation is pretty thorough, so read it. You can ignore all the long-running filter discussion, as noted above. I will quote one bit from the documentation here for emphasis, though, and expound on it:
Note that "%f" is the name of the path that is being worked on. Depending on the version that is being filtered, the corresponding file on disk may not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input.
When Git is running the smudge filter, it:
has opened some internal object (which may or may not be packed);
has decompressed it, or is in the process of decompressing it, and pumped / is-pumping out the data; and
this data is being fed to your filter, but is not written out to any file anywhere.
Your filter can use %f to know the name of the target output file, but the data are not in that file yet. The data bytes are only in some OS-level pipes or sockets or whatever your OS uses for connecting the output of one program (Git's internal decompressors) to another (your filter). Your smudge filter must read its standard input to get the data, and write the smudged data to standard output so that Git can read it (if necessary) and/or redirect that output to the correct file. Do not attempt to open the file by name!
(The same holds for the clean filter, except that in many cases, the input to your filter is just the raw data already in the file, so that opening the file and reading it mostly works. So this can mislead you, if you do your tests using a clean filter.)
Note that you can implement this scheme without a clean filter at all: your smudge filter can replace whatever is in the committed file even if it's a real team ID, rather than just a template. If you choose to do this, however, you'll "see" the team ID changing every time a different team-ID commits the file. The nice thing about using the clean filter is that once the committed copies of the file use the template line, every future cleaned file also uses the template line, so that it never changes.
Alternative: a template file
In general, it's unwise to commit actual configurations. Clean and smudge tricks can work, but they can only go so far: this particular file format works well because the change you want made is on a single line, and Git itself shows you file changes on a line-by-line basis, and sed works well with line-oriented input, and so on.
A lot of configuration files, though, wind up storing at least slightly-sensitive data, or perhaps very-sensitive data such as cleartext passwords. Such files should not be stored in Git at all if at all possible. Instead, you would store a template file in Git.
In this case, for instance, instead of storing MyProject.xcodeproj/project.pbxproj, you might have Git store MyProject.xcodeproj/project.pbxproj.template. This file would have template-ized contents. When you clone and check out the repository, you'd subsequently copy the template file into place and do any required adjustments.
Should the MyProject.xcodeproj/project.pbxproj file itself need to change, e.g., to acquire a new SWIFT_VERSION setting, you'd instead edit the template file, add that to Git, and commit. You would then use the usual "convert template to mine" process, or manually update the MyProject.xcodeproj/project.pbxproj file. Since this file is never committed—and is listed in .gitignore—it never goes into any commit and you never have to worry about collisions within it. Only the template file goes into Git.

Git: list case-sensitive paths that have collided during clone

When cloning a git repository that contains case-sensitive file paths (e.g. /README.md and /readme.md) on a case-insensitive file system (like NTFS or APFS), git will only check out one of the colliding files.
In macOS, how can I list all the files that collided because of case insensitivity?
There is no built in thing to find this. phd's comment will get you close, possibly close enough, but may over-fit a bit (although one might still like to know about these things).
For instance, suppose some commit has files:
path/TO/file1.ext
path/to/file2.ext
On your file system, only either path/TO or path/to may exist. Once one of those exists, these two files will be dropped into the same path/$to folder, where $to is either lowercase or uppercase. They will still be separate files, but will be called out by case-folding and sort-and-unique-dash-c-ing.
On macOS, we can also have collisions in paths due to Unicode normalization. Linux considers a file named 's' 'c' 'h' 'combining-umlaut' 'o' 'n' to be one file name, and a file named 's' 'c' 'h' 'o-with-umlaut' 'n' to be a second, different file name. The macOS default file systems will turn both names into a common form and claim that this is just one name. (I have no idea what Windows does with this.) A proper tool will should take this into account as well.
Note that Git will store each file separately in the index, and can update each separate index entry from a file-system-stored-file independent of the stored-file's path name. So we could have Git build a mapping from internal name to external name and make it handle these cases all automatically. But that's a pretty big task.

How to ignore/revert changes of case in file-content?

I am versioning Microsoft Access VBA code, which is in general case insensitive. However changes the case of variable names happen every now in then (by the Access compiler or by the developer). This often leads to huge change set in my git workspace.
How can I revert or ignore changes, that only concern upper- or lowercase of file contents?
An example:
git init
echo "public sub example()\nend sub" > mdlExample.ACM
# ^-- lower e
git add --all
git commit --all --message "Initial Commit"
echo "public sub Example()\nend sub" > mdlExample.ACM
# ^-- upper E
I would love something like:
git restore --only-case-changes # not working
And then:
git status
> On branch master
> nothing to commit, working tree clean
Consider changing example="example" to Example="Example". How do you propose Git could decide which case change to ignore here? Now consider code snippets in comments, or stored as strings for code generators. I get wanting Git to make an annoying chore go away, but I think if you try to imagine telling Git exactly what you want you'll understand the context of your question a little better.
How can I revert or ignore changes, that only concern upper- or lowercase of file contents
When you want to temporarily ignore changes, when you want to do a diff or a blame without seeing those changes, you can use a "textconv" filter that normalizes the text you diff. I use those to do things like strip embedded timestamps out of generated html when diffing, quickest to hand atm is
[diff "doc-html"]
textconv = "sed 's,<span class=\"version\">Factorio [0-9.]*</span>,,;s,<[^/>][^>]*>,\\n&,g'"
wordRegex = "<[^>]*\\>|[^< \\t\\n]*"
in .git/config, and
doc-html/*.html diff=doc-html
*.cfg -diff
in .git/info/attributes.
so my what-changed diffs don't show me things I don't care about.
If you want to see the results of a diff ignoring case, try
[diff "nocase"]
textconv="tr A-Z a-z"
and drop * diff=nocase (or maybe*.vba diff=nocase) into .git/info/attributes. When you're done, take it out.
but for merging, my leadoff example should convince you that Git automatically and silently making case changes in repo content, even just in the text that looks like identifiers, is a Bad Idea. When there's a conflict, not just a one-sided change but two different changes, it's still going to take some human judgement to decide what the result should be. Fortunately, with any decent merge tool, resolving simple conflicts is down around subsecond range each.
You don't have to git restore anything: You could setup a clean content filter driver as illustrated here, which will automatically convert those cases on git diff/git commit.
That means:
you won't even see there is a diff
you won't add /commit anything because of that content filter driver.
Image from "Keyword Expansion" section of the "ProGit book"
This is done through:
a .gitattributes filter declaration, which means you can associate it only to certain files (through, for instance, their extension)
*.ACM filter=ignoreCase
a local git config filter.<driver>.clean to declare the actual script (which should be in your PATH)
git config filter.ignoreCase.clean ignoreCase.sh
# that give a .git/config file with:
[filter "ignoreCase"]
clean = ignoreCase.sh
The trick is:
Can you write a script which takes the content of an ACM file as input and produces the same ACM file as output, but with its strings converted?
You can have the filename in those scripts so you can do a diff and detect if said difference has to be adjusted, but you still need to write the right command to replace only "xxx" strings when their case changes in ACM files.
Note: jthill suggests in the comments to set the merge.renormalize config settings to tell Git that canonical representation of files in the repository has changed over time.
Have you considered the answer from this StackOverflow question:
How to perform case insensitive diff in Git
Maybe you can write a script to go and do the diff comparison for each commit and then add those commits to your branch. It may not be as simple as you like but maybe it will simplify the display of the changes to allow you to get to the case insensitive changes quicker?

How do I get a file manifest for each revision in a git repository?

I have a git repository that was created on Microsoft Windows. Microsoft Windows has a case insensitive file system. The people checking into this repository have not been careful about the case of their filenames. This means that the same directory or file sometimes shows up under two different names.
I mean to fix this problem. But in order to really fix it, I have to get a handle on it.
Is there a quick and simple way to get a list of the files at each revision?
I need this in order to figure out which revisions (if any) have the same file under two different names so I can decide on a strategy for fixing such cases. This means I need to get this information en-masse as quickly as possible so the analysis consumes a resonable amount of time.
One way to get this is with ls-tree:
git ls-tree -r --name-only <commit>
(Note that this looks at the portion of the tree corresponding to your current directory, so you should either run it from the top level of your repo, or give the --full-tree option.)
This is essentially instantaneous, since all Git has to do is recursively examine the tree; it doesn't even have to look at the contents of files.
I'm not sure how you're going to use a list of filenames to detect the same file under two different names. If you just mean that you want to look for filenames that would be the same on a case-insensitive filesystem, then the list of filenames is all you needed.
However, if you think the files might actually have the same content, you could drop the --name-only, so that you'll also see the SHA1s of all the file, and can find identical files by looking for duplicate hashes.
You could run something like this:
git log --name-only --pretty="format:%H"
This command will show the the sha1 and the list of changed files for every revision.

Purpose of "$Id: ..." line in the header of source files

/* $Id: file.c,v 1.0 2010/09/15 01:12:10 username Exp $ */
I find this line in many source code files in the comment at the top (header) of the file. Why? Is it aimed for the version control software? -Thanks.
These sort of comments are automatically modified by various source code control systems, things like author, date, history and so forth.
See here for some common ones for RCS which is the first source code control system I ever saw to implement this sort of thing (that doesn't mean it was the first, just that RCS was the first I ever used and it had that capability).
One particular trick we used to use was to put the line:
static char *fileId = "$Id: $";
into the source file (and header files as well, although the names had to be unique) so that, when it was built, it would automatically have the ID of the files in the executable itself.
Then we could use something like strings to find out which source files were used to build the executable. Ideal for debugging problems in the field.
It tells CVS (and other VCSs) to expand the value of the Id at check-out time, so anybody reading the source file in question will know what version exactly was checked out for it. Not very popular any more (you can always ask your VCS for such info if you keep the source file in a client / repository / working directory -- or however else your VCS calls such things;-).
I believe you are correct. It appears to be a keyword substitution string for CVS.
Take a look at this question $id: name of file, date/time creation Exp $

Resources