Concurrency in a GIT repo on a network shared folder

Concurrency in a GIT repo on a network shared folder - windows

I want to have a bare git repository stored on a (windows) network share. I use linux, and have the said network share mounted with CIFS. My coleague uses windows xp, and has the network share automounted (from ActiveDirectory, somehow) as a network drive.
I wonder if I can use the repo from both computers, without concurrency problems.
I've already tested, and on my end I can clone ok, but I'm afraid of what might happen if we both access the same repo (push/pull), at the same time.
In the git FAQ there is a reference about using network file systems (and some problems with SMBFS), but I am not sure if there is any file locking done by the network/server/windows/linux - i'm quite sure there isn't.
So, has anyone used a git repo on a network share, without a server, and without problems?
Thank you,
Alex
PS: I want to avoid using an http server (or the git-daemon), because I do not have access to the server with the shares. Also, I know we can just push/pull from one to another, but we are required to have the code/repo on the share for back-up reasons.
Update:
My worries are not about the possibility of a network failure. Even so, we would have the required branches locally, and we'll be able to compile our sources.
But, we usually commit quite often, and need to rebase/merge often. From my point of view, the best option would be to have a central repo on the share (so the backups are assured), and we would both clone from that one, and use it to rebase.
But, due to the fact we are doing this often, I am afraid about file/repo corruption, if it happens that we both push/pull at the same time. Normally, we could yell at each other each time we access the remote repo :), but it would be better to have it secured by the computers/network.
And, it is possible that GIT has an internal mechanism to do this (since someone can push to one of your repos, while you work on it), but I haven't found anything conclusive yet.
Update 2:
The repo on the share drive would be a bare repo, not containing a working copy.

Git requires minimal file locking, which I believe is the main cause of problems when using this kind of shared resource over a network file system. The reason it can get away with this is that most of the files in a Git repo--- all the ones that form the object database--- are named as a digest of their content, and immutable once created. So there the problem of two clients trying to use the same file for different content doesn't come up.
The other part of the object database is trickier-- the refs are stored in files under the "refs" directory (or in "packed-refs") and these do change: although the refs/* files are small and always rewritten rather than being edited. In this case, Git writes the new ref to a temporary ".lock" file and then renames it over the target file. If the filesystem respects O_EXCL semantics, that's safe. Even if not, the worst that could happen would be a race overwriting a ref file. Although this would be annoying to encounter, it should not cause corruption as such: it just might be the case that you push to the shared repo, and that push looks like it succeeded whereas in fact someone else's did. But this could be sorted out simply by pulling (merging in the other guy's commits) and pushing again.
In summary, I don't think that repo corruption is too much of a problem here--- it's true that things can go a bit wrong due to locking problems, but the design of the Git repo will minimise the damage.
(Disclaimer: this all sounds good in theory, but I've not done any concurrent hammering of a repo to test it out, and only share them over NFS not CIFS)

Why bother? Git is designed to be distributed. Just have a repository on each machine and use the publish and pull mechanism to propagate your changes between them.
For backup purposes, run a nightly task to copy your repository to the share.
Or, create one repository each on the share and do your work from them but use them as distributed repositories from which you can pull changesets from each other. If you use this method, then performance of doing builds and so on will be decreased since you will be constantly accessing over the network.
Or, have distributed repositories on your own computers, and run a periodic task to push your commits to the repositories on the share.

Sounds just as if you'd rather like to use a centralized versioning system, so the query for backup is satisifed.
Perhaps with xxx2git in between for you to work locally.

Related

How do I stop OneDrive from downloading git.exe on Windows?

I have used Git on Windows for a while, but recently changed the setting and got this.
On almost every command for Git Bash (also on PowerShell and Github Desktop) I get
git.exe is being downloaded on OneDrive
(translation may not be exactly the same)
The setting that changed recently is moving my repos to a OneDrive folder in order to have them synced between two sessions: that is work desktop and remote virtual machine.
I can see that this may not be ideal, but it really works for me since I have the same settings on both sessions, and not really get used to doing many commit-push-pull. Not the main topic here, but feel free to comment.
(Edit): Upon reading solution, there are other ways to set this syncing that doesn't mess up with the internals of Git. Look for that instead. Thanks.
In any case, the strange thing is that the notifications happen only on the Remote Virtual Machine, but not on the desktop.
I have seen some notifications about some files in the repos, which I then attribute to OneDrive being nosy about every move I make file I move. But then I've also seen files I don't know about, and theres always git.exe attached to the notification.
In the first scenario I have tried tuning down the notifications for OneDrive. Some might say Microsoft does have a background for not letting users setup their notifications, so I'm still looking.
Thanks.

Most file syncing tools like OneDrive and Dropbox operate by syncing data file by file. This is a great approach if you're working on a single word-processing document or spreadsheet. However, it's not as great when you're working with a Git repository.
When changing between branches or making a commit, Git changes and creates a lot of files all at once. In order to be synced correctly, all of the created files must be written in a similar order: all the blobs must be written, then the trees, then the commits, and then the refs can be updated. If you do this out of order, your repository can be corrupted, since you can have branches that refer to objects that don't exist (or objects that refer to other objects that don't exist).
In addition, these tools can end up deleting files you wanted to have in your working tree or recreating files you didn't. So overall, you don't want to sync any Git repository using one of these tools.
You can write a bundle file with git bundle and sync that, or you can use rsync to sync a repository provided it's idle (not being modified) when you do. Note that if you sync a working tree, Git will need to refresh all files when you sync it across to the new machine, and also Git doesn't try to defend against untrusted users who have access to the working tree.
It's also not a good idea to sync your Git installation itself via OneDrive, which is what it sounds like might be happening. Instead, install Git for Windows on each machine independently and don't try to sync it across. OneDrive should have configuration options that let you control what's synced.

Hg repository corruption when using Windows network shared directory

I hope I can get some help here as SO UX is better than Mercurial mail list.
I've been happily using Mercurial at home for years. I am also using it with Bitbucket Cloud for a couple of more serious (but still hobby) projects.
Last year I switched my team at work from SVN (company hosted) to Hg (self-hosted, with the central repo on a network location). We are all in Windows. Since then, we're continuously having problems with severe central repository corruption, which can only be resolved using backup, e.g.:
% hg verify --verbose
repository uses revlog format 1
checking changesets
checking manifests
manifest#92: unknown parent 1 ef0f96d78ab6 of ef0f96d78ab6
manifest#92: reading delta ef0f96d78ab6: integrity check failed on
00manifest.i:88
manifest#93: unknown parent 1 e336adb3580b of e336adb3580b
manifest#93: reading delta e336adb3580b: integrity check failed on 00manifest.i:89
manifest#94: reading delta 7243aebd542b: unknown compression type '\x08'
manifest#95: reading delta 899e4507ca01: unpack requires a string argument of length 12
manifest#96: reading delta 12d4d930da4f: Manifest had an entry with a zero-length filename.
...
Some people say we shouldn't use a network share for the central repository, due to problems with locking. Others explain that Mercurial doesn't use those locks, and network shares should work fine, unless there are problems with the file system.
Considering the latter, I wonder if I could somehow debug our installation without asking the company to provide a server for hg. I don't know much about the configuration we are using, but here is what I see. The directory is accessible via a Windows network path: \\domain.com\path\path\our-directory. Inside, we created a directory called root where .hg resides. In .hgrc, the path is accordingly
[paths]
default = \\domain.com\path\path\our-directory\root
Our network directory is backed up (by the company). Hg version is 4.9.

I have had a similar experience with a similar setup.
First thing to note is that I thought older HG versions definitely did have some problems when run over Windows network file shares, so make sure your version is current. (That was years ago, IIRC, so this may be unlikely to be the root cause of your present issue).
Secondly, in my case these problems seemed to be compounded from running HG from within a virtual machine. Instead I now run an [hg serve][1] instance on a PC which is not virtualized, and hit that with the various HG clients. No more problems.
It appeared that if the connection between the PC running hg serve and the file server was more reliable than from where I ran hg as a client, this avoided the problem. Apparently the HTTP connection hg serve uses to the client is itself more reliable.
I can't say that is a definitive solution because I never found a root cause. But this seems to have avoided any more corruption for quite some time.
Note that hg serve is built right into the standard hg command line tool, you can run it from anywhere easily, and it doesn't have to run on the same server where the physical repository is stored. So in my case I use it quite casually; (obviously) you might need to coordinate with your IT people if you need something more robust.

SVN - Steps to get all the files from a repository?

We have an existing repository on the network accessed via HTTP:.
Should I first import these files to my local machine? I tried importing directories, files, etc., everything is empty in my local folders. It says "success", but nothing ever shows up!
It doesn't make sense to create a repository on my side. But all the tutorials seem to say that, but then I think they're assuming you're starting from nothing.
My experience with Tortoise SVN has mostly been negative. Typically whatever I think I should do turns out to be incorrect, and I end up having to undo, and redo, or lose my work. Once I even managed to corrupt the main repository and it had to be restored from backup.
I absolutely cannot damage this existing repository!

If you're used to CVS or some older version control systems, note that SVN uses the same terms differently. In those, checkout often means lock in exclusive mode.
In SVN checkout will make a copy and automatically manage the revisions and help you merge from multiple sources. You don't need to lock a file, unless it's graphical or some other binary where merging doesn't make sense.
So in TortoiseSVN, you can checkout, and edit the files. The icons on the files will change to indicate their status.
SVN is easy in comparison to git, where the same terms are again redefined and significantly augmented!

Mercurial: is it possible to compress .hg folder to several large BLOBs?

Issue: cloning mercurial repository over network takes too much time (~ 12 minutes). We suspect it is because .hg directory contains a lot of files (> 15 000).
We also have git repository which is even larger, but clone performance is quite good - around 1 minute. Looks like it's because .git folder which is transferred over network has only several files (usually < 30).
Question: does Mercurial support "repository compressing to single blob" and if it does how to enable it?
Thanks
UPDATE
Mercurial version: 1.8.3
Access method: SAMBA share (\\server\path\to\repo)
Mercurial is installed on Linux box, accessed from Windows machines (by Windows domain login)

Mercurial use some kind of compression to send data on the network ( see http://hgbook.red-bean.com/read/behind-the-scenes.html#id358828 ), but by using Samba, you totally bypass this mechanism. Mercurial thinks the remote repository is on a local filesystem and the mechanism used is different.
It clearly says in the linked documentation that each data are compressed as a whole before sending :
This combination of algorithm and compression of the entire stream
(instead of a revision at a time) substantially reduces the number of
bytes to be transferred, yielding better network performance over most
kinds of network.
So you won't have the problem of 15'000 files you use a "real" network protocol.
BTW, I strongly recommend against using something like Samba to share your repository. This is really asking for various kind of problems :
lock problems when multiple people attempt to access the repository at the same time
file right problems
file stats problems
problems with symlink management if used
You can find information about publishing repositories on the wiki : PublishingRepositories (where you can see that samba is not recommended at all)
And to answer the question, AFAIK, there's no way to compress the Mercurial metadata or anything like that like reduce the number of files. But if the repository is published correctly, this won't be a problem anymore.

You could compress it to a blob by creating a bundle:
hg bundle --all \\server\therepo.bundle
hg clone \\server\therepo.bundle
hg log -R therepo.bundle
You do need to re-create or update the bundle periodically, but creating the bundle is fast and could be done in a post-changeset hook on the server, or nightly. (Since fetching remaining changesets can be done by pulling the repo after cloneing from bundle, if you set [paths] correctly in .hg/hgrc).
So, to answer your question about several blobs, you could create a bundle every X changesets, and have the clients clone/unbundle each of those. (However, having a single one updated regularly + a normal pull for any remaining changesets seems easier...)
However, since you're running Linux on the server anyway, I suggest running hg-ssh or hg-web.cgi. That's what we do and it works well for us. (With windows clients)

Check in - Check out process/version control for PSDs and Image files

The title may not be so clear but the issue I am facing is this:
Are designers are working on large photoshop files across the network, this has a number of network traffic and file corruption issues which I am trying to overcome.
The way I want to do this is to have the designers copy the the files to their machine (Mac OSX) and work on them locally. But the problem then stands that they may forget to copy them back up or that another designer may start work on the version stored on the network.
What I need is a system where the designer checks out the files or folders from the server which locks those files so no other user can copy them until they are checked back in. We do not need to store revisions for the files.
My initial idea was to use SVN or preferably GIT and force lock on checkout somehow, does this sound feasible or is there a better system?

How big are the files on average? Not sure about GIT haven't used it but SVN should be ok - If you did go with SVN I would trial checking out over Http/Https vs Network Path to the repo as you may get a speed advantage out of one or the other. When we vpn to our repo at work it is literally 100 times faster over http than checking out using a network \\path to the repo.

SVN is a good option, but you will have revisions (this is the whole point of SVN). SVN doesn't lock files by default, but you may configure it so that it does. See http://svnbook.red-bean.com/nightly/en/svn-book.html?bcsi_scan_554E00F99A9AD604=0&bcsi_scan_filename=svn-book.html#svn.advanced.locking
I don't know git very well, but since it's not a centralized VCS, I'm pretty sure it isn't the right tool for your situation.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio