How to ignore/revert changes of case in file-content?

How to ignore/revert changes of case in file-content? - bash

I am versioning Microsoft Access VBA code, which is in general case insensitive. However changes the case of variable names happen every now in then (by the Access compiler or by the developer). This often leads to huge change set in my git workspace.
How can I revert or ignore changes, that only concern upper- or lowercase of file contents?
An example:
git init
echo "public sub example()\nend sub" > mdlExample.ACM
# ^-- lower e
git add --all
git commit --all --message "Initial Commit"
echo "public sub Example()\nend sub" > mdlExample.ACM
# ^-- upper E
I would love something like:
git restore --only-case-changes # not working
And then:
git status
> On branch master
> nothing to commit, working tree clean

Consider changing example="example" to Example="Example". How do you propose Git could decide which case change to ignore here? Now consider code snippets in comments, or stored as strings for code generators. I get wanting Git to make an annoying chore go away, but I think if you try to imagine telling Git exactly what you want you'll understand the context of your question a little better.
How can I revert or ignore changes, that only concern upper- or lowercase of file contents
When you want to temporarily ignore changes, when you want to do a diff or a blame without seeing those changes, you can use a "textconv" filter that normalizes the text you diff. I use those to do things like strip embedded timestamps out of generated html when diffing, quickest to hand atm is
[diff "doc-html"]
textconv = "sed 's,<span class=\"version\">Factorio [0-9.]*</span>,,;s,<[^/>][^>]*>,\\n&,g'"
wordRegex = "<[^>]*\\>|[^< \\t\\n]*"
in .git/config, and
doc-html/*.html diff=doc-html
*.cfg -diff
in .git/info/attributes.
so my what-changed diffs don't show me things I don't care about.
If you want to see the results of a diff ignoring case, try
[diff "nocase"]
textconv="tr A-Z a-z"
and drop * diff=nocase (or maybe*.vba diff=nocase) into .git/info/attributes. When you're done, take it out.
but for merging, my leadoff example should convince you that Git automatically and silently making case changes in repo content, even just in the text that looks like identifiers, is a Bad Idea. When there's a conflict, not just a one-sided change but two different changes, it's still going to take some human judgement to decide what the result should be. Fortunately, with any decent merge tool, resolving simple conflicts is down around subsecond range each.

You don't have to git restore anything: You could setup a clean content filter driver as illustrated here, which will automatically convert those cases on git diff/git commit.
That means:
you won't even see there is a diff
you won't add /commit anything because of that content filter driver.
Image from "Keyword Expansion" section of the "ProGit book"
This is done through:
a .gitattributes filter declaration, which means you can associate it only to certain files (through, for instance, their extension)
*.ACM filter=ignoreCase
a local git config filter.<driver>.clean to declare the actual script (which should be in your PATH)
git config filter.ignoreCase.clean ignoreCase.sh
# that give a .git/config file with:
[filter "ignoreCase"]
clean = ignoreCase.sh
The trick is:
Can you write a script which takes the content of an ACM file as input and produces the same ACM file as output, but with its strings converted?
You can have the filename in those scripts so you can do a diff and detect if said difference has to be adjusted, but you still need to write the right command to replace only "xxx" strings when their case changes in ACM files.
Note: jthill suggests in the comments to set the merge.renormalize config settings to tell Git that canonical representation of files in the repository has changed over time.

Have you considered the answer from this StackOverflow question:
How to perform case insensitive diff in Git
Maybe you can write a script to go and do the diff comparison for each commit and then add those commits to your branch. It may not be as simple as you like but maybe it will simplify the display of the changes to allow you to get to the case insensitive changes quicker?

Related

How do I not commit the development team lines in project.pbxproj without deselecting those lines manually?

I am collaborating with my friend on an iOS app. We use different Apple IDs in our Xcodes, so in "Signing and Capabilities" tab of project settings, we select different teams in the "Team" field:
From my observation, changing this affects the MyProject.xcodeproj/project.pbxproj file, which stores the file references that the Xcode project has, in addition to the "Team". Here's a snippet of what is changed:
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor;
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = <my team ID>; /* this is changed */
INFOPLIST_FILE = MyProject/Info.plist;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"#executable_path/Frameworks",
);
PRODUCT_BUNDLE_IDENTIFIER = io.github.sweeper777.MyApp;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 5.0;
TARGETED_DEVICE_FAMILY = 1;
};
The problem arises, when one of us commits this file and the other person pulls. The "puller" will now have the "Team" set to something invalid. When this person then tries to run the app on a real device, there will be code signing errors for obvious reasons. To solve this, this person must tediously go through all the targets that we have, and set each "Team" to their own team.
How can we make it so that on each person's computer, the "Team" stays the same after pulling, but any other changes to MyProject.xcodeproj/project.pbxproj is applied?
Remarks:
Putting the entire MyProject.xcodeproj/project.pbxproj in .gitignore doesn't work, because that would ignore every other change to it. Adding a new file to the project, for example, also changes MyProject.xcodeproj/project.pbxproj, and we want to be able to pull that change.
Manually deselecting the lines that say "DEVELOPMENT_TEAM = ..." when committing is as tedious as reselecting the correct team every time, so that's not a solution.
I found this. Apparently, I can configure git to run sed before git checkout and git add. However, that answer seems ignore the line by deleting it completely. This means that my friend, when he pulls, would still have to reselect the correct team. What I want is the kind of "ignore" that simply stops tracking that line. That is, if there is a local version of that line, use that.
I am also aware that this all wouldn't be a problem if we are on the same team. But if I understand this correctly, I can't have multiple people on my team unless I have a Company account, and not only can I not afford that, I don't own a Company.

I don't use Xcode itself and do not know how to smuggle Git hooks and scripts past the Xcode interface, so you'll need more than just this answer. But you mention sed in comments, and given your proposed file format, that may well be the way to go:
buildSettings = {
ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon;
ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor;
CODE_SIGN_STYLE = Automatic;
DEVELOPMENT_TEAM = <my team ID>; /* this is changed */
INFOPLIST_FILE = MyProject/Info.plist;
LD_RUNPATH_SEARCH_PATHS = (
"$(inherited)",
"#executable_path/Frameworks",
);
PRODUCT_BUNDLE_IDENTIFIER = io.github.sweeper777.MyApp;
PRODUCT_NAME = "$(TARGET_NAME)";
SWIFT_VERSION = 5.0;
TARGETED_DEVICE_FAMILY = 1;
};
Git has the ability to run what it calls clean and smudge filters. These can be used to run any arbitrary program you like, including sed, the "stream editor", which is particularly good at making single-line changes based on regular expression matches.
There is another method that may also work, and may "play better" with Xcode, or may play worse. I'll go over that too, after covering clean and smudge filters.
Before we dive into writing clean and smudge filters, and using them from Git—you'll need to know all of these details as you will have to write your own custom filters—we should start with a simple fact about Git commits: No part of any commit can ever be changed. Once you make a commit, the stuff that's inside the commit—the stored data in all of its files—is the way it is, forever. So these filters have to work within that system. Remember that, as it will help with understanding what we're doing.
How Git makes and stores objects
The files inside a commit are not files, exactly: they're not the same thing as files in your file system, at least. Instead, they are what Git calls objects, specifically blob objects. A blob object holds the file's data; other objects hold the file's name; and commit objects collect everything together to be used all at once. There's one more internal object type for annotated tags but we'll stop here as we're really only interested in the blob-object part.
When Git extracts a commit, it reads the internal blob objects and runs them through internal code to decompress and format them into regular files. This can include doing end-of-line hacking (turning LFs into CRLFs) if desired. Normally, all this happens entirely inside Git, and the end result is that Git writes out an ordinary everyday file for you to use. This ordinary file is what you will work on / with, in Xcode or any other editor and compiler system and so on. These ordinary files are in your working tree.
After you've extracted some commit, you'll do some work on it, by changing some or all of the files in your working tree, to achieve whatever result you wanted. This can include changing the buildSettings, editing Swift code, editing Objectionable-C Objective-C code, and so on. You might add all-new files to the working tree, some of which you never commit at all (you can help make sure this never happens by listing such files in .gitignore).
Eventually, though, you'd like to commit the updated code. To do so, you must run git add, or maybe have your IDE run git add for you (perhaps Xcode has clicky buttons to do this). This invokes code in Git that converts the working tree file(s) back to internal blob objects if and as needed.
Again, normally this is all handled entirely inside of Git. Git will read the working tree file, maybe do CRLF-to-LF-only changes, compress the text, search for duplicate objects, and do all the other complicated things necessary to prepare the file, so that it is ready to be committed. The resulting data need not match what's in your working tree at all: it just has to be something that, when Git later goes to extract the file, produces what you will need in your working tree.
Clean and smudge filters
This is where these clean and smudge filters come in. I said, above, that normally, Git does the extraction and insertion all on its own. For binary files, the only thing Git does here is apply lossless compression.1 For text files, Git can do CRLF/LF substitutions as well. But what if you'd like to do your own operations?
You can: Git will let you do whatever you want during the extract process with a smudge filter, and will let you do whatever you want during the compress process with a clean filter. The clean filter replaces the in-file data, using a stream-edit type process,2 and then Git does its CRLF hacking if any and compressing on the "cleaned" data. The smudge filter replaces the decompressed, post-CRLF-hacking data coming out of Git with the data that should go into the working tree.
Hence you can write, as your clean filter, a sed script of the form:
s/DEVELOPMENT_TEAM = .*;/DEVELOPMENT_TEAM = DEVTEAMTEMPLATE;/
With that as the entire sed script, what sed will do is edit the incoming data stream and replace any actual development team text with the word DEVTEAMTEMPLATE.
Your smudge filter has to work slightly harder: it must find the template line and adjust it so that it contains the correct team ID. Where will you get the correct team ID? That's up to you: perhaps you can store it in a file in your home directory, or in a file that you create in the working tree but never commit in Git. You'll have to write this one or two or however-many-liner sed and/or shell script yourself.
1There are multiple phases of compression; git add does just one, and git checkout undoes all—including reading from "pack" files—as needed. The deeper level of compression, using delta encoding techniques, is entirely invisible at the "object" level, so nobody ever really has to think about it.
2With the advent of Git-LFS, Git gained the ability to run long-lived filters. Before that, Git always used simple stream filtering. The stream filtering is easier to understand, but is less efficient for doing en-masse operations on many files. Here, we're only interested in one file per repository anyway, so there's no need to go into the fancier long-lived filter details.
Defining clean and smudge filters
The tricky part here, with Git, is that you must define the filters in one place—in $HOME/.gitconfig or .git/config, for instance—and then tell Git to invoke them from another place, using the .gitattributes file. This is described in the gitattributes documentation. This documentation is pretty thorough, so read it. You can ignore all the long-running filter discussion, as noted above. I will quote one bit from the documentation here for emphasis, though, and expound on it:
Note that "%f" is the name of the path that is being worked on. Depending on the version that is being filtered, the corresponding file on disk may not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input.
When Git is running the smudge filter, it:
has opened some internal object (which may or may not be packed);
has decompressed it, or is in the process of decompressing it, and pumped / is-pumping out the data; and
this data is being fed to your filter, but is not written out to any file anywhere.
Your filter can use %f to know the name of the target output file, but the data are not in that file yet. The data bytes are only in some OS-level pipes or sockets or whatever your OS uses for connecting the output of one program (Git's internal decompressors) to another (your filter). Your smudge filter must read its standard input to get the data, and write the smudged data to standard output so that Git can read it (if necessary) and/or redirect that output to the correct file. Do not attempt to open the file by name!
(The same holds for the clean filter, except that in many cases, the input to your filter is just the raw data already in the file, so that opening the file and reading it mostly works. So this can mislead you, if you do your tests using a clean filter.)
Note that you can implement this scheme without a clean filter at all: your smudge filter can replace whatever is in the committed file even if it's a real team ID, rather than just a template. If you choose to do this, however, you'll "see" the team ID changing every time a different team-ID commits the file. The nice thing about using the clean filter is that once the committed copies of the file use the template line, every future cleaned file also uses the template line, so that it never changes.
Alternative: a template file
In general, it's unwise to commit actual configurations. Clean and smudge tricks can work, but they can only go so far: this particular file format works well because the change you want made is on a single line, and Git itself shows you file changes on a line-by-line basis, and sed works well with line-oriented input, and so on.
A lot of configuration files, though, wind up storing at least slightly-sensitive data, or perhaps very-sensitive data such as cleartext passwords. Such files should not be stored in Git at all if at all possible. Instead, you would store a template file in Git.
In this case, for instance, instead of storing MyProject.xcodeproj/project.pbxproj, you might have Git store MyProject.xcodeproj/project.pbxproj.template. This file would have template-ized contents. When you clone and check out the repository, you'd subsequently copy the template file into place and do any required adjustments.
Should the MyProject.xcodeproj/project.pbxproj file itself need to change, e.g., to acquire a new SWIFT_VERSION setting, you'd instead edit the template file, add that to Git, and commit. You would then use the usual "convert template to mine" process, or manually update the MyProject.xcodeproj/project.pbxproj file. Since this file is never committed—and is listed in .gitignore—it never goes into any commit and you never have to worry about collisions within it. Only the template file goes into Git.

Bash reference to "parent" Git branch when creating branch off of another branch

Let's say I have a branch called parent-branch and I create a branch right off of that branch by doing
git checkout -b child-branch parent-branch
That's all fine and well of course, but what I am looking (hoping) to do, is to be able to somehow reference parent-branch from within a bash script. So for example, something like git_current_branch and git_main_branch will print the current local branch I am in, and print the master branch, respectively.
Is there a way where I can do something like git_parent_branch (or something along those lines) to have access to the parent-branch in bash and the command line. Whether that be a bash script function, or whatever other potential possibilities might work.
Is there something involving GitHub and / or any associated APIs perhaps where this is possible. I'm not overly familiar with connecting to GitHub other than just using their web interface, so anything in that respect would most likely be of big help (ideally pertaining to my issue here)!

In Git at a fundamental level, branches simply don't have parents. Well, I say simply, but it's not that simple, because we haven't defined branch, and users of Git use the word very loosely and often mean different—and contradictory, sometimes—things when they say it. So let's define branch name first, which at least has a simple, definite meaning:
A branch name is a name whose full spelling starts with refs/heads/, which—in order to exist—contains the hash ID of some existing, valid commit.
The last bit here—which is kind of redundant: an existing commit is valid, and a valid commit (whatever that means) must exist—is a concession to the fact that we can have branch names that don't exist (yet, or any more): xyzzy, for instance, is fine as a branch name, but until you create it, it's just a sort of potential branch name, floating in limbo as it were.
Because a branch name must contain a commit ID to exist, a new, empty repository—which has no commits—has no branch names either. And yet you're on the initial branch. It's in limbo, as yet nonexistent. When you make your first commit in this empty repository, then the branch name actually exists. If you like, you can re-create this special case in a non-empty repository using git checkout --orphan or git switch --orphan. (These are subtly different in how they manipulate Git's index, but both put you in this funky state of being on a branch that does not yet exist.)
This kind of special case aside, because a branch name has to contain some commit hash ID, we normally create a branch by picking some existing hash ID, just as in your example:
git checkout -b child-branch parent-branch
But what Git does with this is to resolve the name parent-branch to a commit hash ID first, then create a new branch—in this case, named child-branch—containing that hash ID. The two branch names have no parent/child relationship; we could run git checkout -b daddy kid or git checkout -b xyzzy plugh and there's no parent/child relationships here either, despite the misleading name in the daddy kid version and the neutral names in the xyzzy plugh case.
Now we come to your own question, though:
Is there a way where I can do something like git_parent_branch (or something along those lines) to have access to the parent-branch in bash and the command line.
Git contains, as a useful tool—parts of Git make use of this in various ways—a fully general string-based configuration system, where we run git config to set some arbitrary string to some arbitrary value. By convention, these strings have a hierarchical structure: user.name and user.email live within the user space; push.default is composed of push + default; and so on. Git even stores them using an INI-file-style syntax.
What this means is that although Git itself has no parent/child relationship, you can make up your own. There are a few obvious drawbacks to doing so:
Git won't maintain it for you.
You need to choose names that Git won't clobber, even in some future release (Git version 3.14 perhaps).
Nobody else will understand what the heck you're doing.
So, if you choose to do this, you're on your own—but let's note that Git does store some per-branch information in the branch.name namespace:
branch.xyzzy.remote is the remote setting for the branch named xyzzy;
branch.xyzzy.rebase is the git pull setting controlling whether the second command to use is git merge or git rebase, and depending on which second command is to be used, what flags, if any, to pass to that second command, when you're on branch xyzzy and you run git pull;
branch.xyzzy.description is the descriptive text that git format-patch will include in a cover letter, when run for branch xyzzy;
and so on. So if you were to add a branch.name.parent string value, you could store your string here. You then merely need to hope that the Git developers don't steal that name—parent—from you in the future.
Since this stuff is totally free-form, you'd just run git symbolic-ref or similar to find the current branch name, then git config --get branch.$branch.parent to get its parent setting, if it has one. If it does not have one, this must be a normal everyday parentless Git branch, rather than one of your own specially decorated branches that does have a nominal parent. To set the parent for some branch, you'd run git config branch.$branch.parent $parent, where $parent is the setting you want. (It's your decision as to whether $parent is required to be a branch name, in which case strings like xyzzy and main and plugh are fine, or whether it could be a remote-tracking name as well, in which case, you'd better use fully-qualified strings like refs/heads/xyzzy, refs/heads/main, and so forth. That will allow you to use refs/remotes/origin/main—a remote-tracking name—as a "parent".)
Is there something involving GitHub and / or any associated APIs perhaps where this is possible.
Definitely not, and this points up another weakness in the idea of using branch.$name.parent: there is no way to record this data on GitHub. It's a purely local setting. Then again, branch names are purely local: there's noting that requires that you call your development branch dev or develop, even if the development branch name in some GitHub repository you've cloned is called dev or develop.
Before I finish this off, let me add another several definitions of branch. We'll needs a few more definitions as well:
A branch tip is the commit to which a branch name points. That is, given some branch name like main that indicates some particular commit hash ID such as a123456..., the tip commit of branch main is a123456.... Checking out a branch by its name—with git checkout or git switch—and then adding a commit automatically stores the new commit's hash ID in the branch name, so that the tip commit automatically advances. The new commit's parent will be the old branch tip.
A branch (in one of its many meanings) is a set of commits that includes the tip commit of a branch (with branch here meaning name that contains a commit hash ID). Where this set of commits begins is in the mind of the user, but if left unspecified, Git generally includes every commit reachable from the tip commit.
To define reachable, see Think Like (a) Git.
A remote-tracking name is a name that exists in your Git repository but was created due to a branch name that your Git saw in some other Git repository. These names live in the refs/remotes/ namespace, which is further qualified by the remote, such as origin. For instance, refs/remotes/origin/main would be a remote-tracking name in your repository, in which your Git remembers the hash ID stored in origin's branch name main, the last time your Git got an update from their Git.
For some users, a remote branch is a branch (in the meaning of series of commits terminating at a tip commit) where the tip commit is given by a remote-tracking name. For other users—or the same user speaking at some other time—a remote branch is a branch that exists in some remote repository, such as origin. These two are easily conflated since your own origin/main tracks the other Git's main, hence the term remote-tracking name. (Git calls this a remote-tracking branch name, but the adjective remote-tracking in front of the noun name seems sufficient here.)
As you can see, the word branch is so loosely defined as to be nearly valueless. We can often reconstruct the correct definition—the one a speaker or writer had in mind—based on context, but for clarity, it's often better to use some other term.

How can I get diff between two git commits in a human-readable format?

I tried
git diff foo1 foo2 > diff.txt
but it seems too hard to read, especially when foo1 and foo2 have lots of commits between them. Is there any way I could get the difference between the commits in a human readable format, like one would see in an Xcode commit?

git diff is as human readable as it gets... but a graphical diff tool is probably what you are looking for.
As this question is tagged Xcode, you can view differences in Xcode with the appropriate view (per file), or use a full blown git client, for example SourceTree.

If you output to a diff file (.diff not .txt) you will probably get colour-coding, which makes it easier to read.
For graphical comparison of files there are many tools. I like Beyond Compare, but there are loads of others.

If for you is not enough SourceTree, I would recommend you BeyondCompare, this allows you compare huge folder structures, althoug is not a free tool, it pays the price.

How do I get a file manifest for each revision in a git repository?

I have a git repository that was created on Microsoft Windows. Microsoft Windows has a case insensitive file system. The people checking into this repository have not been careful about the case of their filenames. This means that the same directory or file sometimes shows up under two different names.
I mean to fix this problem. But in order to really fix it, I have to get a handle on it.
Is there a quick and simple way to get a list of the files at each revision?
I need this in order to figure out which revisions (if any) have the same file under two different names so I can decide on a strategy for fixing such cases. This means I need to get this information en-masse as quickly as possible so the analysis consumes a resonable amount of time.

One way to get this is with ls-tree:
git ls-tree -r --name-only <commit>
(Note that this looks at the portion of the tree corresponding to your current directory, so you should either run it from the top level of your repo, or give the --full-tree option.)
This is essentially instantaneous, since all Git has to do is recursively examine the tree; it doesn't even have to look at the contents of files.
I'm not sure how you're going to use a list of filenames to detect the same file under two different names. If you just mean that you want to look for filenames that would be the same on a case-insensitive filesystem, then the list of filenames is all you needed.
However, if you think the files might actually have the same content, you could drop the --name-only, so that you'll also see the SHA1s of all the file, and can find identical files by looking for duplicate hashes.

You could run something like this:
git log --name-only --pretty="format:%H"
This command will show the the sha1 and the list of changed files for every revision.

Git rename detection when class and filename changed in one commit

What is the best way to handle class renames (e.g. done with Resharper) with Git?
That is, if both the class name and containing file name are changed together and committed without further changes.
It seems the way Git handles renames via a percentage changed heuristic is a bit hit and miss.
For large classes it will be recognised as a rename but for small classes the percentage threshold is reached such that it will be seen as a delete and add.

Keep in mind that in Git's history, file renames are not stored as "this was renamed from X to Y". Instead, the file X exists in one revision, and in the next revision Y exists (and X doesn't). For example:
Revision | Files
---------+----------------------------------
HEAD^ | a.cpp x.cpp z.cpp
HEAD | a.cpp y.cpp z.cpp
In the above diagram, each revision is a row and each contains three files. Between the two revisions, x.cpp was renamed to y.cpp. The only information that the repository stores is the contents of each separate revision.
When Git (or another tool that reads Git repositories) looks at the above history, it notices that y.cpp is a new file in HEAD. Then it looks at the previous revision to see whether a similar file existed. In the case of a straight file rename, then yes, a file called x.cpp with the identical contents existed in the previous revision (and no longer exists in the current revision). So the new file is shown as a rename from x.cpp to y.cpp.
In the case of a rename-and-modify, Git will look at the previous revision's files to see if one file looks close to the new file (in terms of its contents). This is where the heuristic comes in. If most of the lines are the same, then Git will show it as a rename, but if there are enough changed lines compared to unchanged lines, then Git will simply say it looks like a new file.
To answer your question, the best way to handle resharper class renames is to simply do it and commit the new files. Git stores the old and the new files in its repository. Rename detection is handled later, at the time you actually ask about the history. This is why commands such as git log have options like --find-copies and --find-copies-harder.

Years later this is still a quirk I guess because it's fundamental to the way git works.
What I do now is do the rename, commit, then do the change. Bit annoying with refactoring tools but no other solution retains history (--find-copies and --find-copies-harder don't seem to work).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio