Prevent large text file from being added to commit when using GitHub - bash

We want to prevent:
Very large text files (> 50MB per file) from being committed to git instead of git-lfs, as they inflate git history.
Problem is, 99% of them are < 1MB, and should be committed for better diffing.
The reason of variance in size: these are YAML files, they support binary serialization via base64 encoding.
The reason we can't reliably prevent binary serialization: this is a Unity project, binary serialization is needed for various reasons.
Given:
GitHub hosting's lack of pre-receive hook support.
git-lfs lack of file size attribute support.
Questions:
How can we reliably prevent large files from being added to commit?
Can this be done through a config file in repo so all users follow this rule gracefully?
If not, can this be done by bash command aliasing so trusted users can see a warning message when they accidentally git add a large file and it's not processed by git-lfs?
(Our environment is macOS. I have looked at many solutions and so far none satisfy our needs)

Alright, with helps from CodeWizard and this SO answer, I managed to create a good guide myself:
First, setup your repo core.hooksPath with:
git config core.hooksPath .githooks
Second, create this pre-commit file inside .githooks folder, so it can be tracked (gist link), then remember to give it execution permission with chmod +x.
#!/bin/sh
#
# An example hook script to verify what is about to be committed.
# Called by "git commit" with no arguments. The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# To enable this hook, rename this file to "pre-commit".
# Redirect output to stderr.
exec 1>&2
FILE_SIZE_LIMIT_KB=1024
CURRENT_DIR="$(pwd)"
COLOR='\033[01;33m'
NOCOLOR='\033[0m'
HAS_ERROR=""
COUNTER=0
# generate file extension filter from gitattributes for git-lfs tracked files
filter=$(cat .gitattributes | grep filter=lfs | awk '{printf "-e .%s$ ", $1}')
# before git commit, check non git-lfs tracked files to limit size
files=$(git diff --cached --name-only | sort | uniq | grep -v $filter)
while read -r file; do
if [ "$file" = "" ]; then
continue
fi
file_path=$CURRENT_DIR/$file
file_size=$(ls -l "$file_path" | awk '{print $5}')
file_size_kb=$((file_size / 1024))
if [ "$file_size_kb" -ge "$FILE_SIZE_LIMIT_KB" ]; then
echo "${COLOR}${file}${NOCOLOR} has size ${file_size_kb}KB, over commit limit ${FILE_SIZE_LIMIT_KB}KB."
HAS_ERROR="YES"
((COUNTER++))
fi
done <<< "$files"
# exit with error if any non-lfs tracked files are over file size limit
if [ "$HAS_ERROR" != "" ]; then
echo "$COUNTER files are larger than permitted, please fix them before commit" >&2
exit 1
fi
exit 0
Now, assuming you got both .gitattributes and git-lfs setup properly, this pre-commit hook will run when you try to git commit and make sure all staged files not tracked by git-lfs (as specified in your .gitattributes), will satisfy the specified file size limit.
Any new users of your repo will need to setup core.hooksPath themselves, but beyond that, things should just work.
Hope this helps other Unity developers fighting with growing git repo size!

How can we reliably prevent large files from being added to commit?
Can this be done through a config file in the repo so all users follow this rule gracefully?
Since GitHub doesn't support server-side hooks you can use client-side hooks. As you probably aware, those hooks can be passed and be disabled with no problem, but still, this is a good way to do it.
core.hooksPath
Git v2.9 added the ability to set the client hooks on remote folder. Prior to that, the hooks must have been placed inside the .git folder.
This will allow you to write scripts and put them anywhere. I assume you know what hooks are but if not feel free to ask.
How to do it?
Usually, you place the hooks inside your repo (or any other common folder).
# set the hooks path. for git config, the default location is --local
# so this configuration is locally per project
git config core.hooksPath .githooks

Related

How to make Git file size checking faster?

I have a bash script that checks whether the files to be committed fit a size limitation. However, when there are a large number of files, the script can take a long time to complete, even if there are no files that exceed the limit.
Here is the original script:
result=0
for file in $( git diff-index --ignore-submodules=all --cached --diff-filter=ACMRTUXB --name-only HEAD )
do
echo $file
if [[ -f "$file" ]]
then
file_size=$( git cat-file -s :"$file" )
if [ "$file_size" -gt "$max_allowed_packed_size" ]
then
echo File $file is $(( $file_size / 2**20 )) MB after compressing, which is larger than our configured limit of $(( $max_allowed_packed_size / 2**20 )) MB.
result=1
fi
fi
done
fi
exit $result
Do you have any idea to improve the performance of checking the staged files?
1.Use Git LFS (Large File Storage): Git LFS is an open-source Git extension that replaces large files with text pointers. This allows Git to handle large files more efficiently, which can speed up file size checking.
2.Ignore large files: You can also speed up file size checking by ignoring large files that are not necessary for the repository. You can do this by creating a .gitignore file in the root directory of your repository and adding patterns for the files or file types you want to ignore.
3.Use shallow cloning: Shallow cloning means that you only clone a certain number of commit histories from the remote repository. This can significantly reduce the amount of data you need to download and check, and can speed up file size checking.
4.Use Git hooks: Git hooks are scripts that run automatically when certain Git events occur, such as a commit or push. You can use a Git hook to check the file size of new or modified files and reject them if they exceed a certain size limit. This can help prevent large files from being added to the repository in the first place, which can save time on file size checking.
5.Use a faster computer or network: If your computer or network is slow, file size checking will naturally be slower. Upgrading your computer or network can help speed up file size checking.
$git config --global
Install git-sizer
$git-sizer --help
$git status #check status

Git smudge and clean using local configuration branch

The local configuration of the project I'm working on involves changing several files in complicated ways that cannot be committed to any submitted branches. To work around this I've committed these local configuration changes to a dedicated local branch config, and have been running this bash script config.sh after starting a new work branch:
#!/bin/bash
# put relevant config files in array
mapfile -t files < <(git diff config develop --name-only)
# overwrite only those files to my working directory
git checkout config -- ${files[#]}
# unstage them so they aren't accidentally committed
git reset HEAD ${files[#]}
echo The following files were successfully overwritten for local configuration:
printf '\t%s\n' "${files[#]}"
Along with another .deconfig script that does the same in reverse. Run directly from the terminal, these scripts have been working fine, but I'd like to streamline the process further using git's clean and smudge filters. So I created a .gitattributes file:
*.* filter=config
and then added this to my .git/config file:
[filter "config"]
smudge = ./config.sh
clean = ./deconfig.sh
However, it just isn't working. If I had to guess it's because git isn't expecting me to run an additional checkout as part of a filter, which itself runs after the checkout command against all files. Most use cases for smudge and clean seem to involve simple find and replace operations, but that approach would be complicated to implement and difficult to maintain given the complexity of changes needed. I could store the configuration files in a static, external directory somewhere, but I'd like to smudge and clean based off the same configuration branch because the local configuration itself frequently evolves and benefits from versioning alongside the rest of the project, and ideally the branch could be used as a baseline for other devs for their local configuration. Git's filter-branch might be a better fit but git's own documentation recommends against using it at all. Is there a way to do this? Is there something wrong with my git configuration? Could the script itself be causing a problem? Any other possible approaches?
Although it is not documented anywhere, you cannot change the state of the working tree with a smudge or clean filter. Git expects to invoke the filter once for each file by piping data into it and reading the data from the standard output. In other words, these filters are intended to be invoked on a per-file basis and process only that file, not by modifying the working tree state.
The best solution to your problem is to avoid keeping a separate branch. Simply keep all of the files, both development and production, in some directory, and use a script to copy the correct one into place. The location of the running config file should be ignored, so the script won't cause Git to show anything as modified. Alternatively, keep a template somewhere, and have the script generate the appropriate one based on the environment. This is good if you have secrets for production that should not be checked in; you can pass them to the script through the environment and have the right values generated.
What you're doing is related to ignoring tracked files, which, as outlined in the Git FAQ, generally can't be done successfully.

Is there a delay in Git changing the file system?

I have a small Bash script which includes some Git commands. (For certain reasons, I cannot use git hooks here.)
Basically, it does
git pull origin <<some repo>> || { echo "Git pull FAILED"; exit 1; }
# do something with the new/changed files on the file system
In not reproducible cases, this fails. In these cases, old versions of the files (being at the state before git pull) are used instead of the new files (at the state after git pull).
However, if I manually do git pull and afterward run the other command, there was never any problem.
So, I'm now wondering if there is any delay/asynchronicity in Git changing the files on the file system after a pull. If yes: How can I deal with it (maybe avoiding sleep or something like that)? If not: What else could cause the confusion of file versions here?

Programmatically overwrite a specific local file with remote file on every git pull

I have an XML file that we consider binary in git. This file is externally modified and committed.
I don't care about who edited it and what's new in the file. I just want to have the latest file version at every pull. At this time, at every git pull I have a merge conflict.
I just want that this file is overwritten on every git pull, without manually doing stuff like git fetch/checkout/reset every time I have to sync my repo.
Careful: I want to overwrite just that file, not every file.
Thanks
I thought you could use Git Hooks, but I don't see one running before a pull...
A possible workaround would be to make a script to delete this file and chain with the needed git pull...
This answer shows how to always select the local version for conflicted merges on a specific file. However, midway through the answer, the author describes also how to always use the remote version.
Essentially, you have to use git attributes to specify a specific merge driver for that specific file, with:
echo binaryfile.xml merge=keepTheirs > dir/with/binary/file/.gitattributes
git config merge.keepTheirs.name "always keep their file during merge"
git config merge.keepTheirs.driver "keepTheirs.sh %O %A %B"
git add -A
git commit -m "commit file for git attributes"
and then create keepTheirs.sh in your $PATH:
cp -f "$3" "$2"
exit 0
Please refer to that answer for a detailed explanation.
If the changes to your files are not actual changes, you should not submit them. This will clutter your version history and cause numerous problems.
From your statement I’m not quite sure which is the case, but there are 2 possibilities:
The file in question is a local storage file, the contents of which are not relevant for your actual sourcecode. In this case the file should be part of your .gitignore.
This file is actually part of your source and will thus have relevant changes in the future. By setting up the merge settings like you are planning to do, you will cause trouble once this file actually changes. Because merges will then be destructive.
In this case the solution is a little bit more complicated (apart from getting a fix for the crappy tool that changes stuff it doesn’t actually change …). What you are probably looking for is the assume unchanged functionality of git. You can access it with this command:
git update-index --assume-unchanged <file>
git docu (git help update-index):
You can set "assume unchanged" bit to
paths you have not changed to cause git not to do this check. Note that setting this bit on a path does not mean git will check the
contents of the file to see if it has changed — it makes git to omit any checking and assume it has not changed. When you make changes
to working tree files, you have to explicitly tell git about it by dropping "assume unchanged" bit, either before or after you modify
them.

Git symbolic links in Windows

Our developers use a mix of Windows and Unix-based OSes. Therefore, symbolic links created on Unix machines become a problem for Windows developers. In Windows (MSysGit), the symbolic link is converted to a text file with a path to the file it points to. Instead, I'd like to convert the symbolic link into an actual Windows symbolic link.
The (updated) solution I have to this is:
Write a post-checkout script that will recursively look for "symbolic link" text files.
Replace them with a Windows symbolic link (using mklink) with the same name and extension as dummy "symbolic link"
Ignore these Windows symbolic links by adding an entry into file .git/info/exclude
I have not implemented this, but I believe this is a solid approach to this problem.
What, if any, downsides do you see to this approach?
Is this post-checkout script even implementable? I.e., can I recursively find out the dummy "symlink" files Git creates?
Update note
For most Windows developers struggling with symlinks and git on Windows and the issues of sharing a repo with *nix systems, this topic is a solved problem -- once you update your Windows understanding of mklink a bit and turn on Developer Mode.
See this more modern answer before digging into the following deep git hacks discussion.
Older systems:
I was asking this exact same question a while back (not here, just in general), and ended up coming up with a very similar solution to OP's proposition.
I'll post the solution I ended up using.
But first I'll provide direct answers to OP's 3 questions:
Q: "What, if any, downsides do you see to this approach?"
A: There are indeed a few downsides to the proposed solution, mainly regarding an increased potential for repository pollution, or accidentally adding duplicate files while they're in their "Windows symlink" states. (More on this under "limitations" below.)
Q: "Is this post-checkout script even implementable? i.e. can I recursively find out the dummy "symlink" files git creates?"
A: Yes, a post-checkout script is implementable! Maybe not as a literal post-git checkout step, but the solution below has met my needs well enough that a literal post-checkout script wasn't necessary.
Q: "Has anybody already worked on such a script?"
A: Yes!
The Solution:
Our developers are in much the same situation as OP's: a mixture of Windows and Unix-like hosts, repositories and submodules with many git symlinks, and no native support (yet) in the release version of MsysGit for intelligently handling these symlinks on Windows hosts.
Thanks to Josh Lee for pointing out the fact that git commits symlinks with special filemode 120000. With this information it's possible to add a few git aliases that allow for the creation and manipulation of git symlinks on Windows hosts.
Creating git symlinks on Windows
git config --global alias.add-symlink '!'"$(cat <<'ETX'
__git_add_symlink() {
if [ $# -ne 2 ] || [ "$1" = "-h" ]; then
printf '%b\n' \
'usage: git add-symlink <source_file_or_dir> <target_symlink>\n' \
'Create a symlink in a git repository on a Windows host.\n' \
'Note: source MUST be a path relative to the location of target'
[ "$1" = "-h" ] && return 0 || return 2
fi
source_file_or_dir=${1#./}
source_file_or_dir=${source_file_or_dir%/}
target_symlink=${2#./}
target_symlink=${target_symlink%/}
target_symlink="${GIT_PREFIX}${target_symlink}"
target_symlink=${target_symlink%/.}
: "${target_symlink:=.}"
if [ -d "$target_symlink" ]; then
target_symlink="${target_symlink%/}/${source_file_or_dir##*/}"
fi
case "$target_symlink" in
(*/*) target_dir=${target_symlink%/*} ;;
(*) target_dir=$GIT_PREFIX ;;
esac
target_dir=$(cd "$target_dir" && pwd)
if [ ! -e "${target_dir}/${source_file_or_dir}" ]; then
printf 'error: git-add-symlink: %s: No such file or directory\n' \
"${target_dir}/${source_file_or_dir}" >&2
printf '(Source MUST be a path relative to the location of target!)\n' >&2
return 2
fi
git update-index --add --cacheinfo 120000 \
"$(printf '%s' "$source_file_or_dir" | git hash-object -w --stdin)" \
"${target_symlink}" \
&& git checkout -- "$target_symlink" \
&& printf '%s -> %s\n' "${target_symlink#$GIT_PREFIX}" "$source_file_or_dir" \
|| return $?
}
__git_add_symlink
ETX
)"
Usage: git add-symlink <source_file_or_dir> <target_symlink>, where the argument corresponding to the source file or directory must take the form of a path relative to the target symlink. You can use this alias the same way you would normally use ln.
E.g., the repository tree:
dir/
dir/foo/
dir/foo/bar/
dir/foo/bar/baz (file containing "I am baz")
dir/foo/bar/lnk_file (symlink to ../../../file)
file (file containing "I am file")
lnk_bar (symlink to dir/foo/bar/)
Can be created on Windows as follows:
git init
mkdir -p dir/foo/bar/
echo "I am baz" > dir/foo/bar/baz
echo "I am file" > file
git add -A
git commit -m "Add files"
git add-symlink ../../../file dir/foo/bar/lnk_file
git add-symlink dir/foo/bar/ lnk_bar
git commit -m "Add symlinks"
Replacing git symlinks with NTFS hardlinks+junctions
git config --global alias.rm-symlinks '!'"$(cat <<'ETX'
__git_rm_symlinks() {
case "$1" in (-h)
printf 'usage: git rm-symlinks [symlink] [symlink] [...]\n'
return 0
esac
ppid=$$
case $# in
(0) git ls-files -s | grep -E '^120000' | cut -f2 ;;
(*) printf '%s\n' "$#" ;;
esac | while IFS= read -r symlink; do
case "$symlink" in
(*/*) symdir=${symlink%/*} ;;
(*) symdir=. ;;
esac
git checkout -- "$symlink"
src="${symdir}/$(cat "$symlink")"
posix_to_dos_sed='s_^/\([A-Za-z]\)_\1:_;s_/_\\\\_g'
doslnk=$(printf '%s\n' "$symlink" | sed "$posix_to_dos_sed")
dossrc=$(printf '%s\n' "$src" | sed "$posix_to_dos_sed")
if [ -f "$src" ]; then
rm -f "$symlink"
cmd //C mklink //H "$doslnk" "$dossrc"
elif [ -d "$src" ]; then
rm -f "$symlink"
cmd //C mklink //J "$doslnk" "$dossrc"
else
printf 'error: git-rm-symlink: Not a valid source\n' >&2
printf '%s =/=> %s (%s =/=> %s)...\n' \
"$symlink" "$src" "$doslnk" "$dossrc" >&2
false
fi || printf 'ESC[%d]: %d\n' "$ppid" "$?"
git update-index --assume-unchanged "$symlink"
done | awk '
BEGIN { status_code = 0 }
/^ESC\['"$ppid"'\]: / { status_code = $2 ; next }
{ print }
END { exit status_code }
'
}
__git_rm_symlinks
ETX
)"
git config --global alias.rm-symlink '!git rm-symlinks' # for back-compat.
Usage:
git rm-symlinks [symlink] [symlink] [...]
This alias can remove git symlinks one-by-one or all-at-once in one fell swoop. Symlinks will be replaced with NTFS hardlinks (in the case of files) or NTFS junctions (in the case of directories). The benefit of using hardlinks+junctions over "true" NTFS symlinks is that elevated UAC permissions are not required in order for them to be created.
To remove symlinks from submodules, just use git's built-in support for iterating over them:
git submodule foreach --recursive git rm-symlinks
But, for every drastic action like this, a reversal is nice to have...
Restoring git symlinks on Windows
git config --global alias.checkout-symlinks '!'"$(cat <<'ETX'
__git_checkout_symlinks() {
case "$1" in (-h)
printf 'usage: git checkout-symlinks [symlink] [symlink] [...]\n'
return 0
esac
case $# in
(0) git ls-files -s | grep -E '^120000' | cut -f2 ;;
(*) printf '%s\n' "$#" ;;
esac | while IFS= read -r symlink; do
git update-index --no-assume-unchanged "$symlink"
rmdir "$symlink" >/dev/null 2>&1
git checkout -- "$symlink"
printf 'Restored git symlink: %s -> %s\n' "$symlink" "$(cat "$symlink")"
done
}
__git_checkout_symlinks
ETX
)"
git config --global alias.co-symlinks '!git checkout-symlinks'
Usage: git checkout-symlinks [symlink] [symlink] [...], which undoes git rm-symlinks, effectively restoring the repository to its natural state (except for your changes, which should stay intact).
And for submodules:
git submodule foreach --recursive git checkout-symlinks
Limitations:
Directories/files/symlinks with spaces in their paths should work. But tabs or newlines? YMMV… (By this I mean: don’t do that, because it will not work.)
If yourself or others forget to git checkout-symlinks before doing something with potentially wide-sweeping consequences like git add -A, the local repository could end up in a polluted state.
Using our "example repo" from before:
echo "I am nuthafile" > dir/foo/bar/nuthafile
echo "Updating file" >> file
git add -A
git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# new file: dir/foo/bar/nuthafile
# modified: file
# deleted: lnk_bar # POLLUTION
# new file: lnk_bar/baz # POLLUTION
# new file: lnk_bar/lnk_file # POLLUTION
# new file: lnk_bar/nuthafile # POLLUTION
#
Whoops...
For this reason, it's nice to include these aliases as steps to perform for Windows users before-and-after building a project, rather than after checkout or before pushing. But each situation is different. These aliases have been useful enough for me that a true post-checkout solution hasn't been necessary.
References:
http://git-scm.com/book/en/Git-Internals-Git-Objects
http://technet.microsoft.com/en-us/library/cc753194
Last Update: 2019-03-13
POSIX compliance (well, except for those mklink calls, of course) — no more Bashisms!
Directories and files with spaces in them are supported.
Zero and non-zero exit status codes (for communicating success/failure of the requested command, respectively) are now properly preserved/returned.
The add-symlink alias now works more like ln(1) and can be used from any directory in the repository, not just the repository’s root directory.
The rm-symlink alias (singular) has been superseded by the rm-symlinks alias (plural), which now accepts multiple arguments (or no arguments at all, which finds all of the symlinks throughout the repository, as before) for selectively transforming git symlinks into NTFS hardlinks+junctions.
The checkout-symlinks alias has also been updated to accept multiple arguments (or none at all, == everything) for selective reversal of the aforementioned transformations.
Final Note: While I did test loading and running these aliases using Bash 3.2 (and even 3.1) for those who may still be stuck on such ancient versions for any number of reasons, be aware that versions as old as these are notorious for their parser bugs. If you experience issues while trying to install any of these aliases, the first thing you should look into is upgrading your shell (for Bash, check the version with CTRL+X, CTRL+V). Alternatively, if you’re trying to install them by pasting them into your terminal emulator, you may have more luck pasting them into a file and sourcing it instead, e.g. as
. ./git-win-symlinks.sh
You can find the symlinks by looking for files that have a mode of 120000, possibly with this command:
git ls-files -s | awk '/120000/{print $4}'
Once you replace the links, I would recommend marking them as unchanged with git update-index --assume-unchanged, rather than listing them in .git/info/exclude.
2020+ TL;DR Answer
Enable "Developer Mode" in Windows 10/11 -- gives mklink permissions
Ensure symlinks are enabled in git with (at least) one of
System setting: check the checkbox when installing msysgit
Global setting: git config --global core.symlinks true
Local setting: git config core.symlinks true
Be careful, support for symlinks in git on Windows is relatively new.
There are some bugs that still affect some git clients.
Notably, symlinks with relative (..) paths are mangled in some programs because of a (fixed) regression in libgit2.
For instance, GitKraken is affected by this because they are waiting on nodegit to update libgit2 from v0.x (regression) to v1.x (fixed).
Recreate missing/broken symlinks
Various levels of success have been reported across multiple git clients with one of these (increasingly forceful and "dangerous") options
Checkout: git checkout -- path/to/symlink
Restore (since git v2.23.0): git restore -- path/to/symlink
Switch branches (away and back)
Hard Reset: git reset --hard
Delete local repository and clone again
Troubleshooting
git config --show-scope --show-origin core.symlinks will show you the level (aka "scope") the setting is set, where the configuration file (aka "origin") that is persisting it is, and the current value of the setting. Most likely a "local" configuration is overriding the "global" or "system" setting. git config --unset core.symlinks will clear a "local" setting allowing a higher level setting to take effect.
The most recent version of Git SCM (tested on version 2.11.1) allows to enable symbolic links. But you have to clone the repository with the symbolic links again git clone -c core.symlinks=true <URL>. You need to run this command with administrator rights. It is also possible to create symbolic links on Windows with mklink.
Check out the wiki.
So as things have changed with Git since a lot of these answers were posted, here is the correct instructions to get symbolic links working correctly in Windows as of:
August 2018
1. Make sure Git is installed with symbolic link support
2. Tell Bash to create hardlinks instead of symbolic links
(git folder)/etc/bash.bashrc
Add to bottom - MSYS=winsymlinks:nativestrict
3. Set Git config to use symbolic links
git config core.symlinks true
or
git clone -c core.symlinks=true <URL>
Note: I have tried adding this to the global Git configuration and at the moment it is not working for me, so I recommend adding this to each repository...
4. pull the repository
Note: Unless you have enabled developer mode in the latest version of Windows 10, you need to run Bash as administrator to create symbolic links
5. Reset all symbolic links (optional)
If you have an existing repository, or are using submodules you may find that the symbolic links are not being created correctly so to refresh all the symbolic links in the repository you can run these commands.
find -type l -delete
git reset --hard
Note: this will reset any changes since the last commit, so make sure you have committed first
It ought to be implemented in MSysGit, but there are two downsides:
Symbolic links are only available in Windows Vista and later (it should not be an issue in 2011, and yet it is...), since older versions only support directory junctions.
(the big one) Microsoft considers symbolic links a security risk and so only administrators can create them by default. You'll need to elevate privileges of the Git process or use fstool to change this behavior on every machine you work on.
I did a quick search and there is work being actively done on this; see issue 224.
Short answer: They are now supported nicely, if you can enable developer mode.
From Symlinks in Windows 10!:
Now in Windows 10 Creators Update, a user (with admin rights) can
first enable Developer Mode, and then any user on the machine can run
the mklink command without elevating a command-line console.
What drove this change? The availability and use of symlinks is a big
deal to modern developers:
Many popular development tools like git and package managers like npm
recognize and persist symlinks when creating repos or packages,
respectively. When those repos or packages are then restored
elsewhere, the symlinks are also restored, ensuring disk space (and
the user’s time) isn’t wasted.
It is easy to overlook with all the other announcements of the "Creator's update", but if you enable Developer Mode, you can create symbolic links without elevated privileges. You might have to reinstall Git and make sure symbolic link support is enabled, as it's not by default.
I would suggest you don't use symlinks within the repository. Store the actual content inside the repository and then place symlinks out side the repository that point to the content.
So let’s say you are using a repository to compare hosting your site on a Unix-like system with hosting on Windows. Store the content in your repository, let’s say /httpRepoContent and c:\httpRepoContent with this being the folder that is synced via Git, SVN, etc.
Then, replace the content folder of you web server (/var/www and c:\program files\web server\www {names don't really matter, edit if you must}) with a symbolic link to the content in your repository. The web servers will see the content as actually in the 'right' place, but you get to use your source control.
However, if you need to use symlinks with in the repository, you will need to look into something like some sort of pre/post commit scripts. I know you can use them to do things, such as parse code files through a formatter for example, so it should be possible to convert the symlinks between platforms.
If any one knows a good place to learn how to do these scripts for the common source controls, SVN, Git, and MG, then please do add a comment.
Here is a batch script for converting symbolic link in repository, for files only, based on Josh Lee's answer. A script with some additional check for administrator rights is at https://gist.github.com/Quazistax/8daf09080bf54b4c7641.
#echo off
pushd "%~dp0"
setlocal EnableDelayedExpansion
for /f "tokens=3,*" %%e in ('git ls-files -s ^| findstr /R /C:"^120000"') do (
call :processFirstLine %%f
)
REM pause
goto :eof
:processFirstLine
#echo.
#echo FILE: %1
dir "%~f1" | find "<SYMLINK>" >NUL && (
#echo FILE already is a symlink
goto :eof
)
for /f "usebackq tokens=*" %%l in ("%~f1") do (
#echo LINK TO: %%l
del "%~f1"
if not !ERRORLEVEL! == 0 (
#echo FAILED: del
goto :eof
)
setlocal
call :expandRelative linkto "%1" "%%l"
mklink "%~f1" "!linkto!"
endlocal
if not !ERRORLEVEL! == 0 (
#echo FAILED: mklink
#echo reverting deletion...
git checkout -- "%~f1"
goto :eof
)
git update-index --assume-unchanged "%1"
if not !ERRORLEVEL! == 0 (
#echo FAILED: git update-index --assume-unchanged
goto :eof
)
#echo SUCCESS
goto :eof
)
goto :eof
:: param1 = result variable
:: param2 = reference path from which relative will be resolved
:: param3 = relative path
:expandRelative
pushd .
cd "%~dp2"
set %1=%~f3
popd
goto :eof
For those using Cygwin on Windows Vista, Windows 7, or above, the native git command can create "proper" symbolic links that are recognized by Windows apps such as Android Studio. You just need to set the CYGWIN environment variable to include winsymlinks:native or winsymlinks:nativestrict as such:
export CYGWIN="$CYGWIN winsymlinks:native"
The downside to this (and a significant one at that) is that the Cygwin shell has to be "Run as Administrator" in order for it to have the OS permissions required to create those kind of symbolic links. Once they're created, though, no special permissions are required to use them. As long they aren't changed in the repository by another developer, git thereafter runs fine with normal user permissions.
Personally, I use this only for symbolic links that are navigated by Windows applications (i.e., non-Cygwin) because of this added difficulty.
For more information on this option, see this Stack Overflow question: How to make a symbolic link with Cygwin in Windows 7
I just tried with Git 2.30.0 (released 2020-12-28).
This is not a full answer, but a few useful tidbits nonetheless. (Feel free to cannibalize for your own answer.)
Git Wiki Entry
There's a documentation link when installing Git for Windows
This link takes you here: https://github.com/git-for-windows/git/wiki/Symbolic-Links -- And this is quite a longish discussion.
FYI: There are at least three "kinds of links". And just to highlight an important aspect of this wiki entry: I didn't know this, but there are several ways all of which are "kind of" symbolic links on the surface, but on a technical level are very different:
git bash's "ln -s"
Which just copies things. Oh, boy. That was unexpected to me.
(FYI: Plain Cygwin does not do this. Mobaxterm does not do this. Instead they both create something that their stat command actually recognizes as "symbolic link".)
cmd.exe's builtin "mklink" command with the "/D" parameter
Which creates a directory symbolic link. (See the Microsoft documentation)
cmd.exe's builtin "mklink" command with the "/J" parameter.
Which creates a directory junction AKA soft link AKA reparse point. (See the Microsoft documentation.)
Release Notes Entry
Also symbolic links keep popping up in the release notes. As of 2.30.0 this here is still listed as a "Known issue":
On Windows 10 before 1703, or when Developer Mode is turned off, special permissions are required when cloning repositories with symbolic links, therefore support for symbolic links is disabled by default. Use git clone -c core.symlinks=true <URL> to enable it, see details here.
I use symbolic links all the time between my document root and Git repository directory. I like to keep them separate. On Windows I use the mklink /j option. The junction seems to let Git behave normally:
>mklink /j <location(path) of link> <source of link>
For example:
>mklink /j c:\gitRepos\Posts C:\Bitnami\wamp\apache2\htdocs\Posts
I was looking for an easy solution to deal with the Unix symbolic links on Windows. Thank you very much for the Git aliases in previous answers.
There is one little optimization that can be done to the rm-symbolic links, so that it doesn't delete the files in the destination folder in case the alias is run a second time accidentally. Please observe the new if condition in the loop to make sure the file is not already a link to a directory before the logic is run.
git config --global alias.rm-symlinks '!__git_rm_symlinks(){
for symlink in $(git ls-files -s | egrep "^120000" | cut -f2); do
*if [ -d "$symlink" ]; then
continue
fi*
git rm-symlink "$symlink"
git update-index --assume-unchanged "$symlink"
done
}; __git_rm_symlinksenter
One simple trick we use is to just call git add --all twice in a row.
For example, our Windows 7 commit script calls:
git add --all
git add --all
The first add treats the link as text and adds the folders for delete.
The second add traverses the link correctly and undoes the delete by restoring the files.
It's less elegant than some of the other proposed solutions, but it is a simple fix to some of our legacy environments that got symbolic links added.
Here's a PowerShell script to replace Unix symbolic links with Windows.
# This fixes permission denied errors you might get when
# there are Git symbolic links being used on repositories that
# you share in both POSIX (usually the host) and Windows (VM).
#
# This is not an issue if you are checking out the same
# repository separately in each platform. This is only an issue
# when it's the same working set (AKA make a change without
# committing on OS X, go to Windows VM and Git status would show
# you that change).
#
# Based on this answer on Stack Overflow: http://stackoverflow.com/a/5930443/18475
#
# No warranties. Good luck.
#
# NOTE: It must be run in elevated PowerShell
$ROOT = $PWD
$symlinks = &git ls-files -s | gawk '/120000/{print $4}'
foreach ($symlink in $symlinks) {
$content = &Get-Content $symlink
$content = $content.Replace("/", "\")
$filename = $symlink.Split("/")[-1]
cd (dirname $symlink)
rm $filename
echo Linking $content -> $filename
New-Item -ItemType SymbolicLink -Path $filename -Target $content
&git update-index --assume-unchanged $symlink
cd $ROOT
}

Resources