I have imported a rather large repository from another SCM into git. Unfortunately the migration was done (had to be) on Windows and every file got committed into git with the execute bit set. To avoid having to do the migration again (it is a long and hang-prone process) I am trying to figure out if I can clean out the executable bit server side. My thought is using git filter-branch somehow combined with git update-index, but I could take hints as to how to proceed.
Doing a huge commit at the end clearing all executable bits is not a solution -- I don't want every file to have a bump in the history.
This seems to do the trick:
git filter-branch --index-filter 'git ls-files -s |
sed s/^100755/10644/ |
git update-index --index-info' -- --all
Your solution is quite good, but there is another possibility: git config core.filemode false:
http://git-scm.com/docs/git-config
core.fileMode
If false, the executable bit differences between the index and the working copy are ignored; useful on broken filesystems like FAT. See git-update-index(1).
The default is true, except git-clone(1) or git-init(1) will probe and set core.fileMode false if appropriate when the repository is created.
This may create more work for everyone who has to clone the repo in the future (or it may not, I'm not really sure), so your solution is probably better, but I thought I'd throw this out there as it may be more appropriate for someone else's use case...
Related
Our team works on repository which includes one directory where there are files with pipe character "|". I'm the only one on Windows so pipe character is illegal for files names.
Is there a way to rename files in the directory "extra" from ex. "2021|08|05" to "2021\08\05" when I make "git pull"?
Is there a way to rename files in the directory "extra" from ex. "2021\08\05" to "2021|08|05" when I make "git push"?
No.
No.
These are the right answers to the question you've asked, but the trick is, you've asked the wrong question. Alas, the answer to the right question is: "yes, but it's a horrible solution". The question is: Is there a solution to the problem of bad / invalid characters in file names stored in Git commits?
In the early days of Git, when it was a collection of shell scripts that only a few people could use successfully, 😀 one would obtain new commits from elsewhere with git fetch, then read these commits into Git's index with git read-tree.
Git's index, which Git also calls the staging area or sometimes the cache, can hold these file names. In fact, even on Windows, Git's index can hold files named aux.h, which Windows won't let you create. The index has no folders either: it just has files with names with embedded (forward) slashes, such as path/to/file. Git can hold two different files, one named README and one named readme, in its index. WIndows can't have two different files whose name only differs in case.
So, Git's index / staging-area can hold these files just fine. The problem comes when you go to work with the files. Files that are in Git's index, are stored there in a special Git-only format, as what Git calls a blob object. You cannot read or write a blob object directly. You have to use yet more Git commands to do this. It's terribly inconvenient.
To use Git conveniently, then, we normally don't use all the individual one-step-at-a-time internal Git operations: we use some sort of higher level, user oriented command, like git checkout. We check out an entire commit: Git will find all the files that are stored in that commit, read them into Git's index, and copy out and expand all the internal Git-only blob objects into ordinary files, with ordinary file names.
This step—copying files out of Git's index, to make them usable—is where things go wrong, on Windows. The file's name in Git's index is, say, path/to/2021|08|05. Git recognizes that path/to/ has to be turned into two folders, path\ and path\to\, on Windows, so that Git can create a a file in the second folder. Unfortunately, Git has no way to remap the 2021|08|05 part. That part is going to stay 2021|08|05, and as you have seen, Git can't create a file with that name: the OS just says "no".
What you can do, at this point, is drop down to those lower-level commands. You can run:
git rev-parse :path/to/2021|08|05
perhaps with quotes if needed, depending on your shell:
git rev-parse ":path/to/2021|08|05"
This git rev-parse command will show the blob hash ID for the file. You can then access the file's contents with:
git cat-file -p <hash>
which prints those contents to the standard output. If your shell supports redirection, you can then redirect the output to a file whose name is your choice. This lets you see and use the file's contents.
The git cat-file -p command can take the index path name directly, so:
git cat-file -p ":path/to/2021|08|05" > path/to/2021-08-05
is a way to extract the file to a usable name.
Unfortunately, git add—which is how you would normally update the file—will insist on using the name you gave the file in the file system. Once again, you must fall back on internal Git plumbing commands to work around this. If you need to update that particular file, for instance, you would:
run git hash-object -w -t blob path/to/2021-08-05 to turn the updated file's data into an internal Git object;
run git update-index with arguments to make Git update the entry for path/to/2021|08|05 using the hash ID obtained in step 1.
Once all of this is done, you can go back to normal Git commands, because git commit makes a new commit from what's in Git's index / staging-area.
The (rather large) drawback here is that you cannot use a lot of normal everyday Git commands:
git pull is often a no-go because it runs git rebase or git merge, both of which need to use your working tree (OS-level files). Run git fetch first, then do as much manual work as needed.
git checkout will fail: you can use it, but then you must manually do something about each of the bad file names that are now in Git's index.
git diff will show differences that include deleting the files with the bad names, and git status will show the adjusted-name files as untracked files (because they are).
git add of any changes you need to make to these files is also a no-go; use git hash-object -w and git update-index instead.
git rebase and git merge become difficult. You probably can deal with them as in steps 2 and 4, but that's painful at best.
I'm working on a project that's hosted on GitLab and uses issue/work branches and merge requests to bring that work into the master branch when it's done. Usually I work on issue branches. When it has been merged by GitLab, I need to switch to the current master to do a build, locally.
My workflow is this:
Switch to master
Pull from remote (--ff-only)
Remove stale remote tracking branches
Also remove their local tracking branches
There's also a client-side tool that watches the code directory and updates some files (CSS, JavaScript). When it sees a change in the first step (switch to master), I first need to wait for it to finish before going on (to avoid confusion). If there's a difference between the issue branch and the old master, there's a good chance that the difference will disappear when updating master (as that issue branch is now merged).
I'm looking for a way to switch to the already-updated master branch in one step. How can I do that with a git command? I want to bundle up all these actions in a batch file to avoid repeating all those manual steps in TortoiseGit every time.
This question is different from the suggested one in that the local master branch already exists. I'm not switching to a new branch from a remote, but to a branch that already exists and it just behind the remote.
TL;DR
Unless you write your own script (or use a Git alias to run multiple commands and/or scripts), you can't get this down to a single command, but you can get closer. See the long section for many caveats: the biggest one is that it assumes you're not already on master when you do it. If you are, the second step won't work (see the long section for what will).
git fetch -p &&
git fetch . refs/remotes/origin/master:refs/heads/master &&
git checkout master
will take care of the first three bullet points—not in the same order—with a single work-tree-updating git checkout step.
(Note that I split this into three lines for posting purposes, but as a Git alias using !, it's really all one big line.)
Long
There are several approaches, including actual, literal batch files (shell scripts on Unix-like systems, or .BAT files, or whatever) and aliases (as suggested by Joe in a comment).
There's also a client-side tool that watches the code directory and updates some files ...
This is ... not necessarily a good idea, let's say. :-)
While git checkout master runs, it's changing various files. Let's say that for some reason, it changes one of several files that the watcher watches, but then it pauses for a few minutes (or seconds, or microseconds, or some unit of time anyway). While it is paused, the watcher tries to combine the multiple files that are now out of sync.
Maybe this is OK and self-correcting when Git un-pauses and finishes the checkout—but it might be better if you could make sure the update only happens when the checkout is done.
That aside, let's take a look at this particular series of commands, and be very concrete about which Git command you're using:
Switch to master
I assume this is git checkout master.
Pull from remote (--ff-only)
I assume this is git pull origin master --ff-only or perhaps just git pull --ff-only.
Remove stale remote tracking branches
I'll assume for now that this is git fetch --prune. If you are doing something different, you should include that in your question.
Also remove their local tracking branches
If I understand what you mean, this requires a script. Note that this is somewhat dangerous: suppose you have your own branch X on which you are doing development. This X is not related to anyone else's X. Then someone creates their own X—using the same name—and sends it to the machine from which you git fetch. You now have origin/X. Then they delete their X (because they're done with it) and delete origin/X. If you now have your script delete your X, because origin/X went away, that would probably be bad.
If you only delete your X when it explicitly has origin/X set as its upstream, this particular case won't occur—but if someone accidentally deletes your origin/X thinking it was their origin/X, the same problem crops up again, and this time that particular protection does not work.
Anyway, with all that aside, let's look at the variant I suggested above.
git fetch -p
This updates all your origin/* names,1 including origin/master, without affecting any files in your working tree. The -p is short for --prune, so it deletes any origin/* names that no longer have a corresponding branch in the Git over at the URL stored under the name origin.
1I assume here that you have only one remote, which is named origin. If you have more than one remote, use git fetch origin -p to make sure you're fetching specifically from the one named origin. I also assume you have not configured your Git to be a single-branch clone.
git fetch . refs/remotes/origin/master:refs/heads/master
This rather magic-looking command tells your Git to call itself up. That is, the special name . refers to your own Git repository. We are using this to trick your Git into fast-forwarding your master branch based on your updated origin/master. The final argument is what does this: we say to your Git: OK, my Git, when you talk to that other Git, find out what commit its refs/remotes/origin/master identifies. Then, if that's a fast-forward operation, update my refs/heads/master to match.
Of course, the "other Git" your Git is talking to is itself—so this means fast-forward my master from my origin/master.2 It's roughly equivalent to:
git checkout master && git merge --ff-only origin/master && git checkout -
except that no actual checking-out occurs: no files in your work-tree change.
2You might wonder why some of these use origin/master and some use refs/remotes/origin/master. The longer one is just the full spelling of the name. When using git fetch, it's wise to use the full spellings. In fact, in general, in scripts, you might want to use full spellings more often, but specifically git fetch can become confused if the other Git you talk to accidentally has both a branch and a tag with the same name, for instance. So I'm illustrating the full names with git fetch. You'll use it to talk to your own Git, so if you don't mix up your tags and branch names or otherwise create ambiguity, you won't actually need the full names. But it's a good habit with git fetch.
The above fails if you're on your master
The git fetch command will refuse to fetch into whatever branch name you have checked out. So if you are on master, this git fetch . trick will fail.
In a way, this is OK! If you are on your master, what you should do instead is run:
git merge --ff-only origin/master
or anything equivalent. This is what your git pull --ff-only does: first it runs git fetch (without the -p and limited to fetching only the other Git's master); then it runs git merge --ff-only.
A more complete version
A more complete version of this sequence, then, is to first check: Which branch am I on? To do that, you can use either of two Git commands:
git rev-parse --abbrev-ref HEAD
or:
git symbolic-ref --short HEAD
Both of these will print master if you are currently on your own master branch. The difference between them is what they do if you're on no-branch-at-all: e.g., in the middle of a rebase, when you are in "detached HEAD" state. In that case, the second command—the git symbolic-ref one—errors out, while the first one just prints HEAD.
If you'd like to avoid doing any of this when in such a state, use the second command and check for failure. Otherwise, use the first one. I'll illustrate just the first one here:
if test $(git rev-parse --abbrev-rev HEAD) = master; then
# already on master - use alternative strategy
git fetch -p && git merge --ff-only refs/remotes/origin/master
else
# not currently on master: use fancy tricks to update
git fetch -p &&
git fetch . refs/remotes/origin/master:refs/heads/master &&
git checkout master
fi
The above, while untested, should be suitable as a shell script. If you have Git installed, you have the ability to run shell scripts—or you can turn the above into a very long Git alias, using ! and the appropriate set of semicolons.
I recently updated Git to version 2.7.2.windows.1 (I am running Windows 7 64-bit). Since the update, I have been unable to run git add with the -p option on files within a certain directory (or its subdirectories) whose name is _ (an underscore).
git status correctly reports that my file has changes:
PS C:\Users\Carl\www\dl> git status
On branch develop
Your branch is up-to-date with 'origin/develop'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: _/php/class.Menu.php
And I can add the entire file with a simple git add, or by specifying the file by name. But if I try to include the -p or --patch option (both variations produce the same results), Git reports that there are no changes:
PS C:\Users\Carl\www\dl> git add -p .\_\php\class.Menu.php
No changes.
This only happens for files within the _ directory, but it doesn't matter whether I cd into that directory to run the git add command without having to explicitly specify a path with an underscore in it; it still doesn't work:
PS C:\Users\Carl\www\dl\_\php> git add -p .\class.Menu.php
No changes.
I had initially thought this problem was related to a similar one I encountered recently on files within the _ directory, which I asked about here. However, that problem appears to have been related to Posix path conversion in MinGW, whereas this problem occurs whether I use Git Bash, Windows PowerShell, or cmd.exe.
As I said in that previous question, I believe underscores to be valid in file/directory names. Additionally, I am not the owner of the project so I cannot rename the directory or move the file.
Is this a bug in Git? Are there any additional steps I can take to determine what the underlying issue is?
Well, I was able to reproduce this, and seems that it is the same POSIX-to-Windows path conversion. ProcessMonitor shows that git (actually, perl run by git) looks for a file C:\Program Files\Git\php\class.Menu.php.
To work this around (at least, that worked for me), according to documentation, you can set the environment variable MSYS_NO_PATHCONV temporarily, like so (in git bash):
MSYS_NO_PATHCONV=1 git add -p _/php/class.Menu.php
(I don't know how to set env variables in windows' cmd/powershell, but that should be possible, too.)
You shouldn't enable MSYS_NO_PATHCONV globally/permanently (e.g. using export in git bash or modifying windows' user/system environment variables in system settings), because that can lead to unwanted effects, and it'll probably break much more things than it'll fix (see this SO comment). Actually, git-windows folks warn against even temporary enabling MSYS_NO_PATHCONV.
Having said that, I'm starting to think that OP's problem is a git-for-windows bug and should be reported as such (might have something to do with the fact that git-add is a binary, but git-add--interactive is a perl script).
Another listed workaround is to double the first slash, like git add -p _//php/class.Menu.php (or does that mean the parameter must start with a double slash?), but that doesn't seem to work due to complex intermediate path conversions, that happen between the invocation of git add and the real file access.
I'd try without that .. Also I've never passed a filename to git add -p. I just make my change and run that as is. I would also check to make sure any changes you're making are in fact being applied to that specific file, and the file is being touched.
I have an XML file that we consider binary in git. This file is externally modified and committed.
I don't care about who edited it and what's new in the file. I just want to have the latest file version at every pull. At this time, at every git pull I have a merge conflict.
I just want that this file is overwritten on every git pull, without manually doing stuff like git fetch/checkout/reset every time I have to sync my repo.
Careful: I want to overwrite just that file, not every file.
Thanks
I thought you could use Git Hooks, but I don't see one running before a pull...
A possible workaround would be to make a script to delete this file and chain with the needed git pull...
This answer shows how to always select the local version for conflicted merges on a specific file. However, midway through the answer, the author describes also how to always use the remote version.
Essentially, you have to use git attributes to specify a specific merge driver for that specific file, with:
echo binaryfile.xml merge=keepTheirs > dir/with/binary/file/.gitattributes
git config merge.keepTheirs.name "always keep their file during merge"
git config merge.keepTheirs.driver "keepTheirs.sh %O %A %B"
git add -A
git commit -m "commit file for git attributes"
and then create keepTheirs.sh in your $PATH:
cp -f "$3" "$2"
exit 0
Please refer to that answer for a detailed explanation.
If the changes to your files are not actual changes, you should not submit them. This will clutter your version history and cause numerous problems.
From your statement I’m not quite sure which is the case, but there are 2 possibilities:
The file in question is a local storage file, the contents of which are not relevant for your actual sourcecode. In this case the file should be part of your .gitignore.
This file is actually part of your source and will thus have relevant changes in the future. By setting up the merge settings like you are planning to do, you will cause trouble once this file actually changes. Because merges will then be destructive.
In this case the solution is a little bit more complicated (apart from getting a fix for the crappy tool that changes stuff it doesn’t actually change …). What you are probably looking for is the assume unchanged functionality of git. You can access it with this command:
git update-index --assume-unchanged <file>
git docu (git help update-index):
You can set "assume unchanged" bit to
paths you have not changed to cause git not to do this check. Note that setting this bit on a path does not mean git will check the
contents of the file to see if it has changed — it makes git to omit any checking and assume it has not changed. When you make changes
to working tree files, you have to explicitly tell git about it by dropping "assume unchanged" bit, either before or after you modify
them.
My goal involves having a file with the same name but different implementations in different branches. For example, I want to develop in a branch with verbose mode and another that works silently. Or, one branch uses a list, but the other uses a hash. Similar to prior question.
In my case, the changes are in a file with the same name. Unfortunately, checkout from one branch to the other merges the files of the same name (content?). In that case, the release version inherits the verbose print statements I had hoped to keep separate.
I learned and succeeded in using stash save; checkout; (edit other branch, add, commit); checkout back; and stash apply (to erase merge changes caused by checkout). It works, but the manual's examples (interrupted workflow, partial commits) suggest this is not the intended workflow. Creating an orphan branch for verbose destroys the history. Is there another way to switch between branches without carrying unintended changes to files with the same name?
Update I can't replicate the behavior any longer, despite seeing it five times before submitting here. It used to show the text below. But, I guess this question should be closed.
$ git checkout master
M Test.java
Switched to branch 'master'
I think the following command is what you are looking for:
git update-index --assume-unchanged <file>
To undo run:
git update-index --no-assume-unchanged <file>
From ""Difference Between 'assume-unchanged' and 'skip-worktree'", I would go with:
git update-index --skip-worktree -- a file
git update-index --no-skip-worktree -- a file
skip-worktree is useful when you instruct git not to touch a specific file ever.
That is handy for an already tracked config file.