I am using Git for Windows (version 2.15, but the same issue occurs in 2.14 and I think older versions as well) and I noticed a rather annoying behavior: When I perform some basic git operations*), the modification date of the .git/objects/pack/pack-*.pack file changes. The file itself remains unchanged, but the last modification date field gets updated, which causes my backup software to think the file was changed and needs to be added to my differential backup. Because my .pack files are rather large, this increases the size of my daily backups significantly. Is there a way to prevent this behavior? That is, keep the pack file completely unchanged, including its metadata, until I perform a git gc or git repack?
Unfortunately, I wasn't able to pinpoint which operation causes this behavior. When it happened today, I only used git status, git log, git add, git mv and git commit and nothing else and the date/time got changed, but when I tried to replicate the behavior on my yesterday's backup, the date change didn't occur. I guess next time I will run Process Monitor and watch accesses to the file, but in the meanwhile, does anyone have an idea of what might be causing this problem? Thanks.
Instead of referencing your Git repo itself for your backup program to process (with the date issue), you could have:
a task which does a git bundle of your repo (that generates only one file)
your backup program would back up only that one file.
That way, you bypass entirely the modification date issue for those pack files.
You can either save and keep only one copy of a full bundle of the repo.
Or make incremental bundles.
In the end it turns out that Edward Thomson's answer explains why no "real" solution is possible. However, to facilitate my needs, I wrote a simple Windows command-line application which scans through a tree of directories, locates possible Git repositories, locates their packfiles and changes the date/time of each .pack file to that of the respective .idx file. So far it seems to run OK. I did not encounter any garbage collection issues yet, anyway. I did not release the tool yet, because I rather suspect no one else cares, but if someone is interested, I can upload it somewhere.
Apparently, someone is interested. So the program is released as of now. Not on GitHub, but still as open source, under the 3-clause BSD license. Download the binaries here: https://www.pepak.net/files/git/gitpacksync-0.01.zip
and the source code here: https://www.pepak.net/files/git/gitpacksync-0.01-source.zip
If you try to disable this then you would be prone to see subtle bugs where objects that are still in use will disappear from your repository.
You had trouble pinpointing the exact operation because every operation that adds files will do it.
This is very much intentional - Git refreshes the timestamps of objects in the database (updating the timestamp on either loose objects or packfiles) to know when an object was last written. Whenever you create a new commit, it will update the timestamp on all the files that contain objects hat were referenced.
This is important as it helps the tools that remove data (like prune) avoid race conditions: an object may be dereferenced and then re-referenced. Prune will also look at the timestamp, so by touching the file, it will not be eligible for garbage collection.
Related
Git 2.2.0 and 2.2.1 seem to modify the timestamps of old .git/objects/pack/pack-*.pack files occasionally, for no good reason.
It just changes the timestamp; the contents are identical.
Debugging this is difficult as it seems to make changes only fairly rarely.
I have never seen anything like this in any Git version before 2.2.0. What is happening, and can I fix it somehow? Because of the useless timestamp updates I am getting suddenly large amounts of changes for incremental backups.
Git keeps more information on disk than absolutely necessary to record all information in the repository. The unnecessary information is kept to accelerate certain operations and/or avoid having to rewrite files. The algorithm to decide when to delete some of the unnecessary files uses modification time of the pack files as part of the decision process (see find_lru_pack). Therefore mtime is used by a cache-like mechanism in git. Modification time of pack files is changed in git without modifying the file (see freshen_file function) in order to aid the correct caching and avoiding evicting files likely to be used again.
If you modify freshen_file in sha1_file to a no-operation then mtimes should not be ever modified. This will however leave you open to potential data loss if there is a new commit being written with same data as before just as a garbage collection happens (thanks to comment below for pointing this out).
Another approach would be to not backup the git repo itself (with its packfiles), but to backup bundles:
first, you can create incremental bundle or a full bundle of your repo
second, once created, a bundle is one single file, very easy to backup/copy around (less error-prone than an rsync of multiple files, with potential date issue).
the process is easily scriptable (my script does incremental or full backup)
I have been using XCode with subversion for some time now, no problem was caused when I was using it as a single developer (I was using 2 commands only, commit and add).
But now I have to share the code with another developer (who has never used any kind of version control) and integrating/merging the code has become a nightmare. No problem occur when we are integrating/merging .h/.m files but as soon as it comes to ".nib", "xcodeproj" and ".xcdatamodeld" files, we really don't know what to do.
Whenever we try to merge "xcodeproj", project was getting corrupt and merging ".xcdatamodeld" was kind of impossible for us.
So I was wondering if someone can share his/her experience on how to effectively use subversion/git/mercurial with XCode 4.0 in multiuser environment? or share a link, which can explain how to use subversion effectively in multiuser environment.
Thanks.
Are you doing this using Subversion? For 90% to 99% of the files in your repository, the standard Subversion workflow of checkout, edit, commit works well. However, for some types of files such as JPEGS and GIFS simply don't merge well. In this case, you'll have to do it the way we use to in the old SCCS and RCS days: Before you can edit and commit a file, you must lock it.
Locking a file prevents others from editing the same file and committing changes while you're doing your work on the file. It's crude, but it works. In Subversion, you can always lock any file you're editing, but if the file has the property svn:needs-lock on it, it will be checked out as read-only. You have to lock the file before editing it to make it writable, and you're not allowed to commit the file unless it is locked.
So, for those files, set the svn:needs-lock property on it.
You can automatically set this property on all newly added files (depending upon suffix) via setting the auto-properties in your Subversion client configuration.
And, if you really, really want to make sure that all .nibs and xcodeproj and all of the other flies of these types have svn:needs-lock set on them, you can use my pre-commit hook which will prevent these files from being committed unless this property is set.
There is no failsafe way to merge these kinds of files that I am aware of. So you will have to
try to ensure that only one person is changing these files at a time. That won't work always, so just log what you changed in the file with the commit message. Then if there is a conflict, you can manually resolve it by taking the version that changed more of the file and redo manually what the other person did.
That's normally not a big deal, like adding a new source file to an .xcodeproject, or changing the alignment of an element in a .nib. It's becoming a problem if your project is huge or your nib is containing the whole interface. For it to work well (which in practice it does), you need to split up your projects into sub-projects if they grow too huge.
I had the same problem with 2 other developers Xcode with git. Unfortunately, Xcode project files are an XML file, tracks file included in the project as well as setting. I'm not certain, but I think .nib files are also XML files as well. Someone can correct me on that.
Git did a great job at merging the Xcode project file, and never really had any problems with our *.nib files either. The only time we did have a problem is when we both added/removed files with the same names, or someone did a lot of heavy removing and adding of a lot of files.
The only way we solved this was to have each other push ann pull as soon as we added/removed files. So that way the person had the latest files, and didn't add them in their own repository then pull the latest commit which had the same file in it. Or they work adding changes to a file that was removed or renamed.
That is the best solution we found, as soon as we added or removed a file have everyone else in the team pull. Not a great solution btw. However, you should be committing often anyways.
I've got a folder under version control; the contents aren't source, but they are binaries that are modified frequently and would generally get committed once a day.
Problem is, the consumers of those files can't grasp the concept for source control, they don't realistically have access to the folder in question and 'they can't be bothered' to commit once a day.
What I'd like to do, is have an auto-commit, once a day (4 am) of that folder. Are there any existing tools, or do I have to write one?
You could set up an auto-commit using cron or at and a script file calling the appropriate svn ci commands on each client PC. Auto-committing is not a good idea IMO:
Broken, unfinished code might get checked in
any changes will not be documented
but I understand from your description that this may not matter in your case.
Well this file was put in the repo by mistake and was deleted and added to ignore list. However, because it once existed, my repo is now > 4GB in size and some SVN functions take years to complete. I would appreciate any help and tips. (I'm on XP if it matters)
How do I completely remove a file from the repository's history?
There are special cases where you might want to destroy all evidence of a file or commit. (Perhaps somebody accidentally committed a confidential document.) This isn't so easy, because Subversion is deliberately designed to never lose information. Revisions are immutable trees which build upon one another. Removing a revision from history would cause a domino effect, creating chaos in all subsequent revisions and possibly invalidating all working copies.
The project has plans, however, to someday implement an svnadmin obliterate command which would accomplish the task of permanently deleting information. (See issue 516.)
In the meantime, your only recourse is to svnadmin dump your repository, then pipe the dumpfile through svndumpfilter (excluding the bad path) into an svnadmin load command. See chapter 5 of the Subversion book for details about this.
http://subversion.tigris.org/faq.html#removal
From the FAQ
How do I completely remove a file from
the repository's history?
There are
special cases where you might want to
destroy all evidence of a file or
commit. (Perhaps somebody accidentally
committed a confidential document.)
This isn't so easy, because Subversion
is deliberately designed to never lose
information. Revisions are immutable
trees which build upon one another.
Removing a revision from history would
cause a domino effect, creating chaos
in all subsequent revisions and
possibly invalidating all working
copies.
The project has plans, however, to
someday implement an svnadmin
obliterate command which would
accomplish the task of permanently
deleting information. (See issue 516.)
In the meantime, your only recourse is
to svnadmin dump your repository, then
pipe the dumpfile through
svndumpfilter (excluding the bad path)
into an svnadmin load command. See
chapter 5 of the Subversion book for
details about this.
Like many programmers, I'm prone to periodic fits of "inspiration" wherein I will suddenly See The Light and perform major surgery on my code. Typically, this works out well, but there are times when I discover later that — due to lack of sleep/caffeine or simply an imperfect understanding of the problem — I've done something very foolish.
When this happens, the next step is do reverse the damage. Most easily, this means the undo stack in my editor… unless I closed the file at some point. Version control is next, but if I made changes between my most recent commit (I habitually don't commit code which breaks the build) and the moment of inspiration, they are lost. It wasn't in the repository, so the code never existed.
I'd like set up my work environment in such a way that I needn't worry about this, but I've never come up with a completely satisfactory solution. Ideally:
A new, recoverable version would be created every time I save a file.
Those "auto-saved" versions won't clutter the main repository. (The vast majority of them would be completely useless; I hit Ctrl-S several times a minute.)
The "auto-saved" versions must reside locally so that I can browse through them very quickly. A repository with a 3-second turnaround simply won't do when trying to scan quickly through hundreds of revisions.
Options I've considered:
Just commit to the main repository before making a big change, even if the code may be broken. Cons: when "inspired", I generally don't have the presence of mind for this; breaks the build.
A locally-hosted Subversion repository with auto-versioning enabled, mounted as a "Web Folder". Cons: doesn't play well with working copies of other repositories; mounting proper WebDAV folders in Windows is painful at best.
As with the previous method, but using a branch in the main repository instead and merging to trunk whenever I would normally manually commit. Cons: not all hosted repositories can have auto-versioning enabled; doesn't meet points 2 and 3 above; can't safely reverse-merge from trunk to branch.
Switch to a DVCS and "combine" all my little commits when pushing. Cons: I don't know the first thing about DVCSes; sometimes Subversion is the only tool available; I don't know how to meet point 1 above.
Store working copy on a versioned file system. Cons: do these exist for Windows? If so, Google has failed to show me the way.
Does anyone know of a tool or combination of tools that will let me get what I want? Or have I set myself up with contradictory requirements? (Which I rather strongly suspect.)
Update: After more closely examining the tools I already use (sigh), it turns out that my text editor has a very nice multi-backup feature which meets my needs almost perfectly. It not only has an option for storing all backups in a "hidden" folder (which can then be added to global ignores for VCSes), but allows browsing and even diffing against backups right in the editor.
Problem solved. Thanks for the advice, folks!
Distributed Version Control. (mercurial, git, etc...)
The gist of the story is that there are no checkouts, only clones of a repository.
Your commits are visible only to you until you push it back into the main branch.
Want to do radical experimental change? Clone the repository, do tons of commits on your computer. If it works out, push it back; if not, then just rollback or trash the repo.
Most editors store the last version of your file before the save to a backup file. You could customize that process to append a revision number instead of the normal tilde. You'd then have a copy of the file every time you saved. If that would eat up too much disk space, you could opt for creating diffs for each change and customizing your editor to sequentially apply patches until you get to the revision you want.
if you use Windows Vista, 7 or Windows Server 2003 or newer you could use Shadow Copy. Basically the properties window for your files will have a new tab 'previous version' that keeps track of the previous version of the file.
the service should automatically generate the snapshot, but just to be safe you can run the following command right after your moment of "inspiration"
'vssadmin create shadow /for=c:\My Project\'
it has defiantly saved my ass quite a few times.
Shadow Copy
I think it is time to switch editors. Emacs has a variable version-control, which determines whether Emacs will automatically create multiple backups for a file when saving it, naming them foo.~1~, foo.~2~ etc. Additional variables determine how many backup copies to keep.