Do any performance benchmarks exist?
I'm looking to create a repo and commit/ push for legacy code that runs several gigs deep.
Is either faster / footprint etc?
I apologize if this is too vague...
You don't choose between git and mercurial because of performance. They're both good.
Just do the kinds of things you'd be doing and measure. You're likely to get the largest performance variation on the first import -- that won't matter much. Keep digging.
Space-wise, the one place git will definitely win is if you have the same content in lots of different paths in its lifetime. That is, if your several gigs of files get moved. git's model supports this better than hg's. That very well may not matter to you.
In both cases, you should consider whether your several gigs of repository actually represents the source code for a single project.
But again, it would be unwise to choose between these two similar and active projects because of raw performance.
There was a recent (January 2011) performance comparison between Mercurial and Git server performance. The conclusion is that Mercurial gives a more steady performance than Git, but that Git is faster on average.
Original Answer (March 2011, GitHub had less than 3 years)
The correct performance to measure about a DVCS (which performs all operations locally anyway) is the one about your daily tasks:
merge (how quickly do you decide between the various branching models, especially in Mercurial?)
publication workflow (how quickly do you setup one push/pull worlflow?)
integration (how quickly do you integrate Git with IDE, with webapp like Hudson or Jira or Redmine or Track, or ...?)
setup (how quickly do you setup a centralized repository, with what kind of authentication mechanism: that matters if you use a DVCS in an enterprise environment)
The raw performance of basic operations isn't that relevant, provided you understand the limits of a DVCS: you cannot have one single repo into which you would put everything (all projects, or all kind of files like binaries).
Some kind of modules reorganization must take place to define the right amount of repo per "modules" (coherent file sets).
Update 2018, seven years later: The Windows support for Git is now a reality, and aim at improving perfomance/scalability of Git.
To illustrate that, Microsoft has its entire Windows codebase into one (giant) Git repository: See "The largest Git repo on the planet": 3.5M files, 300GB, 4,000 engineers producing 1,760 daily “lab builds” across 440 branches in addition to thousands of pull request validation builds.
But this is with the addition of GVFS (Git Virtual FileSystem), which allows to dynamically download only the portions you need based on what you use.
This is not yet in Git native, although its integration has begun last Dec. 2017, with the implementation of a narrow/partial cloning.
As pointed out #MartinGeisler in his answer, the commit time is very small (if you commit through command-line, you shell returns immediately).
What takes quite long are the network clones/pushes/pulls. Google published small benchmark (see footnote 1) when they had to choose a DVCS for Google code, but it is quite old (summer 2008).
Eric Sink has published the results of a benchmark for SVN, Bazar, Mercurial, Git and his own Veracity.
Unfortunately it's just a single operation (a commit), with a single code base (Valgrind), and I am not sure which version he used for all these VCS's but in any case it must be pretty old as the article dates back to 2011. I guess this is why Eric himself defines them "Ridiculously Unscientific Benchmarks". Anyway, for what it's worth:
SVN is much slower than the others (almost 22 seconds), but all the others are similar (between 3 and 5 seconds). Git is clearly the fastest, and in percentage it's even much faster than Mercurial (which takes 43% more time), but actually we are talking about a difference of 1.4 seconds - hardly noticeable.
Apart from this, I can't find the sources right now, but I've read several times that Git is faster, though the difference is trivial (which confirms this test made by Eric). So I wouldn't worry too much about speed when choosing which one to go with.
There are a lot of articles about SVN vs. Hg in general.
I would like to concentrate only on performance.
Real-live experiences preferred.
Here is my set-up:
(future setup) Windows with IIS fro Hg
(current setup) SVN 1.3.2 on top of apache under windows
I would like to have statistics for most commons operations (commits, stats, local/remote pulls, pushes, etc...). I am not really sure what are the most common operations for Hg.
Performance is NOT the only thing that matters to us but it is highly import and may be the crucial decision point in switching to Hg.
However, I would like to see some statistics. How log did it take to clone repo of 5 gb? or something like that.
First, maybe upgrade your Subversion?
I think the productivity cost of choosing the one that doesn't match the more natural workflow would dwarf any performance differences.
Let's say that Subversion outperforms Mercurial in every way. Great! If the way that you and co-workers use your repository would be better suited to Mercurial, you've become performance-wise and productivity-foolish.
Unless what you're controlling is really huge, you probably won't notice the differences. Choose the one that offers the best capabilities for how you expect to use it.
It depends on so many factors (and no, merging isn't the most common operation - commit is, putting your changes into the thing is the single most important part of it even if DVCS systems do spend all their time merging up and downstream).
So, firstly you need to upgrade your SVN. That's easy, and once you've run 'svnadmin pack' on your repo you'll be able to compare properly. 1.3.2 is ancient! (current version is 1.6.11)
Second, you need to decide whether pushing and pulling large repositories about is important to you. For example, I have a 12 Gig repo to manage. Fortunately, svn allows us to only fetch parts of that, not the whole thing, so management of it is much improved.
Also, there are significant performance improvements coming in v1.7 (ready oh so soon), as performance hasn't been a priority for the SVN guys, they're really been adding features and ensuring rock-solid stability instead. Now, performance is an issue and is being addressed. Take a look on the dev mailing list to see. It might be worth your while to wait a little (or evaluate it using a copy of your repo).
You see, performance may well be the same with your system. It could be bottlenecked on IO which is where svn usually fails (thought the dev mailing list does have some perf figures from a chap with a monster server with raid-0 SSDs, 24 Gb RAM and strangely enough it's bottlenecked on CPU!)
So all in all, you have to figure out your workflow and processes. If Mercurial (which is a good choice) provides that for you, then great, go for it. But if it doesn't, then migrating isn't going to help you no matter how much faster it might be.
(Not exactly an answer, but still it can provide a useful context to your question:)
As mentioned in DVCS tools, the most common operation in a VCS, especially a distributed one, is: merge.
And that, whatever your setup is, will always be easier and quicker to do with Mercurial than SVN. See:
"What makes merging in DVCS easy?"
"SVN merges and DVCS"
"Why is branching and merging easier in Mercurial than in Subversion?"
"What makes some version control systems better at merging?"
"Merging: hg/git vs. svn"
iirc svn is generally quicker for initially getting a remote repo as it doesn't need to pull the history. after that hg is generally quicker as it will use local data much more and has a more compact DB so like for like remote operations are quicker.
at the moment I can only find this as any sort of direct comparison. it is somewhat flawed in that filling any of those VCS with image files is not really what they are for.
if you really want good stats then why not test yourself with your own repo?
I actually know that it's better to have an version control, but I wonder if Time Machine doesn't make a good job on this for lazy programmers?
Because TimeMachine won't track the log of file changes.
e.g. for a given file in Subversion I can easily determine when it changed, and what else changed at the same time.
It won't tell you who changed it, but I assume you're discussing single-user cases in this instance.
And finally you can't tie checkins and changes to builds/deployments.
Installing Subversion (or similar) on Mac OS is trivial, and won't consume resources unless you're checking in/out. It's strongly recommended. And of course make sure your repository is backed up with TimeMachine!
They're actually very similar, but Time Machine doesn't keep track of what version of a file matches a particular build of the system. It's very useful to have the metadata.
Time Machine does not keep all backups (it only keeps hourly backups for a day, and only keeps daily backups for a month), so you will not have every version of the file if you need to go back and figure out exactly what change introduced a problem. It doesn't have diff support built in, though you could probably do a diff -r on the directories in question. It also does not record any commit messages, which can be invaluable in figuring out why someone changed the code in a particular way. Finally, it does not help you coordinate between multiple people or one person on multiple computers; one of the biggest benefits of version control is that it helps you distribute your code and merge in changes from multiple people.
If you want to have quick and easy version control, I'd recommend one of the distributed version control systems. Unlike centralized systems where you have to set up and administer a server, getting started with a distributed system is usually as easy as git init; git add .; git commit. I generally prefer Git, but Mercurial is also a good choice and some find it easier to get started with.
Time Machine will give you backups of your code. That's it.
Source Control will give this as well, but provides a lot more as well. The biggest is branching and merging, which is a huge help on projects.
Time Machine also won't store metadata like checkin comments, or who made what changes.
Finally, Time Machine can't really be distributed. If you want someone else to start working on your code as well then Time Machine won't help. But (good) source control would make collaboration very easy.
The single most important and valuable thing in a version control repository is not the content, it's not even the history (that's #2), it's the commit messages. And you don't have those in a backup program.
On the PC, active backup does pretty much the same thing, and also allows revision comparison albeit limited to file by file basis. The problem comes when you want to do something like list all the differences between version X and version Y, with the intention of building version Z based on a combination of the two. These type of backup systems are ok until you are simultaneously maintaining multiple versions of the same project.
Branches & Tags?
For the reasons everybody else has pointed out, use source code control. It's easy to set up and will solve a lot of problems. It will tell you who changed what when, and why (if you use commit messages properly), and give you details. It's invaluable when you realize that something went wrong within the past few days.
Once you've done that, use Time Machine to keep backups of the repository. A version control system is not a backup system. Nothing is safe that is only on one file system.
dont forget about branching, it's one of most powerful tool a SCM can give you.. I dont know if time machine has some kind of 'alternate reality' feature
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
My development team uses source safe at a very basic level. We're moving into some more advanced and extended development cycles and I can't help but think that not using branching and merging in order to manage changes is going to be biting us very soon.
What arguments did you find most useful in order to convince your team to move to a better solution like SVN?
What programs did you use to bridge the functionality gap so that the team wouldn't miss the ide sourcesafe integration?
Or should I just accept sourcesafe and attempt to shoehorn better practices into it?
Reliability
SVN is a lot more reliable with large databases
SVN is still actively supported
Atomic commit - in VSS when you get latest version while another user is performing checkin, you can get an inconsistent state, forcing you to repeat the "Get latest version" in better case, but sometimes when unlucky you may be left with a codebase which compiles but does not work. This cannot happen in SVN thanks to atomic commits.
Features
SVN branch/merge is a lot better
SVN has builtin support for remote access
SVN is more configurable (integration of external Diff/Merge tools)
SVN is more extensible (hooks)
Better productivity
SVN "Update" is a lot faster compared to SS "Get latest version"
SVN command line is a lot easier and cleaner - this is useful for automated build or testing tools
Same level of IDE Integration
VSS had a lot better VS integration until recently, but with AnkhSVN 2.0 this is no longer true.
Open
SVN is open and there is plenty of various tools using SVN or cooperating with it. Some examples include:
integration with many bug tracker or product cycle management products
shell integration
integration into various products
various management and analysis tools
source is available, you can adjust it to your need, fix the problems (or hire someone to do it for you) should the need arise
Cost
You do not have to pay any license or maintenance fees
First, teach them how to use SourceSafe in an efficient way.
If they are smart enough, they will begin to love the advantages of using a version-control system, and if so, they will soon reach the limits of SourceSafe. That's where they will be the more able to listen to your arguments for switching to a better VCS, could it be a CVCS or a DVCS, depending on what's the team is ready to achieve.
If you try to force them to use another VCS when they use SourceSafe in a wrong way, like saving zip file of source code (don't laugh, that's how they were acting in my company two years ago), they will be completly reluctant to any argumentation, as good as it could be.
Find some excuse to start using non-ASCII characters in your C# code (Chinese and Japanese are excellent for this).
SourceSafe doesn't like Unicode (even though Visual Studio does), so if you choose the right Unicode text and check a file in and back out, your entire file will appear as corrupted gibberish. The beauty of this is that because SS uses a "diff" versioning system, this actually corrupts the file all the way back to the original check-in version, and can't be fixed automatically.
When this happens just one time (as it did to me when working on an application that had to support Japanese), you will probably find it to be a decisive argument in favor of dropping SourceSafe.
There were two features that we used to sell management and the team on SVN over VSS.
1) The ability to branch. When using VSS, when a release was scheduled to go out, the entire repository was locked until the release actually went out. This included the test and fix cycle. So, developers were unable to commit anything other than fixes for the release to the VSS repository. This resulted in long integration sessions immediately following each release. With the use of release branches in SVN, there is no longer any need to lock the entire repository.
2) The ability to rollback an entire change at once. Because SVN records all files changed in a single, atomic commit, it is trivial to revert a problematic change. In VSS, a developer had to go through the entire repository and find every file changed at about the same time and revert each change to each file individually. With SVN, this is as trivial as finding the relevant commit and hitting the "Revert Changes from this Commit" button in TortoiseSVN.
As a side note, we use TortoiseSVN and everyone loves the file overlay icons for seeing what has and has not changed.
Whatever you do, move slowly! Don't start talking to them about branching on Day 1 -- it will just put them off. I'm stereotyping VSS users with that comment, but that's what I see out there.
For the developers: sell it as a replacement for VSS that works better and faster. Use VisualSVN on Day 1 so they have a super-shallow learning curve. Sell them on it being the same except faster, more stable, and 2 people can edit the same file and they won't have problems with some guy being off sick with locks on a bunch of files.
For the admins: sell them on it being more stable and easier to administer than VSS. Show them VisualSVN server.
Good luck!
First, document all the problems you are having that can be traced to root causes within the source control system. Keep track of them for a month or so. Add on top of that missed opportunities resulting from not using it. (if you say "opportunity costs of not using subversion" you may impress an MBA-type manager). These numbers are actually an understimate of the opportunity cost because presumably you could have been doing work that provides more than your hourly bill rate of value if you weren't messing around with VSS.
For example, do you have problems where files are locked that need to be accessed by more than one person?
Have you had problems with partial (non-atomic) check-ins?
Do you have problems where it is difficult for you to keep track of releases of the software and recreate the repository as it was in the past?
Do you have problems getting a copy of the code onto a server that doesn't have a sourcesafe client?
Do you have problems automating your build and testing process because continuous integration tools can't monitor your version control systems for updates?
I am sure you can think of many others.
If you can figure out the approximate time/money costs of problems caused by sourcesafe and benefits of things that subversion provides (using a generic number like $100/hr for labor costs or just hours) and any costs of late delivery of projects, do so. If you have collected data for a month or so, you can show the benefit using subversion per month.
Then present the approximate time/cost of moving to subversion. (About 8 hours to setup and migrate code, and 2 hours per developer to connect, checkout and move projects, something like that) The risk is low, since sourcesafe is still there to rollback to.
If the cost is more than the monthly benefit, you can divide the cost by the benefit to figure out the recovery period. You should also total it up over 3 years or so to show the long term benefit. Again, emphasize that the real opportunity cost is not directly calculable because you could have been adding value during the time you were trying to manage non-branched releases in sourcesafe.
Nobody recommends using SourceSafe any more, not even Microsoft. They will now offer you an (expensive) TFS licence instead. SourceSafe is just not reliable.
I wrote about it here: Visual SourceSafe on E2. It's a bit of a rant, but that's because I had to use SourceSafe for quite a while, and the memory makes me froth at the mouth a bit.
Reliablity is the big one that will bite you. But also there are features that you may appreciate in SVN or TFS:
TFS and SVN both have atomic commits of multiple files, but Sourcesafe does not - if you check in two files "at once", it's not one operation, it's the same as checking in one of the files, then checking in the other. You can get at the state in between, where one file has been checked in, but not the other.
SourceSafe does not keep history of deleted files, file moves or renames.
Contrary to initial impressions, SourceSafe does support multiple simultaneous checkouts of the same file, if you set the right options. But TFS and especially SVN are better designed for this way of working
Unlike SourceSafe, TFS and SVN both work fine against servers on the internet (TFS just OK, SVN excellently) and SVN works well offline - e.g. if you have a laptop on a plane or train and no 'net, you can still work and compare to previous revisions or even revert, since the data to do that is held locally.
As someone else pointed out, SourceSafe, like CVS, is a "dead" product. It is not being actively developed. TFS and SVN will have next versions out some time in the future.
First search google for the sheer quantity of pages describing how bad VSS is and share that with your coworkers.
Second, skip subversion and go straight to a proper distributed SCM like git or mercurial. Because merging is such an inherent part of distributed SCMs, they have to handle merges much better than centralized systems like svn. Subversion is still trying to retrofit itself to handle branching better, where the distributed systems were built correctly to begin with.
The AnkhSVN plugin for VS is pretty good. It's got a few oddities but on the whole works well.
Convincing the team to move is hard work - I never managed it :-( Probably one of the more practical arguments though is speed - VSS is s-l-o-w when you've got a 1GB source database and several users.
edit It's been so long since I used VSS I forgot it was locking! Yes, as mentioned here the ability to move to a non-exclusive/merge changes model should help if you've got more than a handful of developers. It saves yelling "Can somebody check in the common includes" across the office!
You say "What arguments did you find most useful in order to convince your team to move to a better solution like SVN?"
If you don't know that it's a better solution, then why are you making the arguments? If your mind is made up enough to go argue for a solution, you should know what those reasons are already.
What convinced you that you should move to something better? Those are your arguments right there. Anything short of those arguments will sound like it's just an issue of personal preference.
TortoiseSvn (free) is really nice for explorer integration, giving you all the features of svn from a context menu.
VisualSvn (commercial) makes it just as easy to integrate svn into Visual Studio, with the same status indication in the solution browser as well as context menus to use all the subversion features.
Both these tools go a long way to making version control seamless. It's been a coupe of years since I dealt with VSS, but these tools are a way nicer way to use source control.
Ditto for what every one has said about VSS being poop
Subversion has good support for branching and doing merges... I don't remember VSS having any capabilities in this department at all. I do remember teams going through pain of week long merges when needing to release from VSS, pain which just doesn't exist anymore with Subversion.
Build some automation that mirrors the VSS repository into a SVN repository
It takes time to build a consensus. If your SVN mirror of the VSS repository is available at all times, it will be easier to accumulate converts. The mirror doesn't have to be perfect- it just has to be usable. There are existing tools for this purpose.
Tell them to treat the source code as if it was money and point them to the numerous examples of SourceSafe coming down in flames taking the source with it. Things like that are just not supposed to happen in a proper source control system.
The best argument against SourceSafe is that it is just isn't Safe, everything else can potentially be called "features we don't need".
The clincher for us was the speed (i.e., the lack thereof) of VSS over VPN and low bandwidth hotel networks on the road and the problems of trying to tunnel through firewalls so that two teams at two different sites could quickly, securely, and reliably work from the same code repository. We were running two VSS repositories and packaging up "deliveries" that had to be merged into the other site's repository to keep them in sync.
The team grumbled for a while, but quickly got over it. TortoiseSVN is fantastic by itself and the AnkhSVN plug-in for Visual Studio really eased everyone into the changeover.
Looking back, I can't believe how many "Can you check in file SoAndSo?" e-mails we sent around, not to mention the "SourceSafe is down. We've got to restore the repository" e-mails.
Sheesh. After reading this comments and writing this response, I can't believe we put up with VSS for as long as we did.
Web page summarising problems with VSS - just point people to that URL
If you use VisualSVN the team won't miss VSS as much. 2 people being able to work on one file at the same time is a big selling point too.
The unreliability of source safe ("please fix the repository...") was enough of a sell for us. Andecdotally (I've never measured it) SVN also always seems faster. Good concurrent checkouts / merging.
I'd always figured that to a developer it was almost too obvious. SourceSafe just seems to break and die all too often to not want to replace it...
Tell them to read this http://www.highprogrammer.com/alan/windev/sourcesafe.html
I would recommend that you go ahead and start introducing best practices to your sourcesafe usage with a view to changing to subversion further down the line. Hopefully this will make your actual subversion migration easier and give you time to sort plan out your development cycles, branching strategies et al. properly.
The other thing to consider is your development process in general. A source control management system is only ever part of the solution, to get the most out of subversion or any other product you will probably want to look at how it's usage interacts with your code review, qa and build processes.
I don't remember any SourceSafe user ever liking the product. Do your colleagues actually like it?
I've got a similar issue with CVS at my current customer's usage. Since "it works" and they are mostly pleased with it, I cannot push them to change. But daily I sure wish they would!
When I was at the launch for VS2005 I managed to corner a Microsofty and ask why SourceSafe was so awful to use. The reply I got was rather shocking, not just because of what he said but because he was so up front about what he'd said.
He told me that it was only really meant for one person to use and even then it wasn't very good at doing that.
My colleagues and I were a bit shocked we couldn't think of much else to do other than laugh out loud, as did the Microsofty! He then told us that it wasn't used internally.
So, we switched to subversion shortly after that. We'd pretty much decided to go for it before the launch event, but that just confirmed we'd made the right decision.
We used to use SourceSafe. Then, when I joined the team I was in a different location and even though we have a fairly good LAN when I tried to check out the latest version it took 40 minutes. I persuaded them to convert to CVS (we now use SVN) and the checkout time dropped to a couple of minutes. SourceSafe was just too slow to be usable at a remote location.
We moved from SourceSafe to Source Gear Vault. This source control engine is very comfortable for some one used to SourceSafe. We finally decided to make the change after a couple SourceSafe corruption incidents, that came at critical times. So my advice would be to focus your sales presentation on SourceSafes unreliability.
Surely using source safe is enough reason to want to migrate to another source control system?
I used SVN and CVS at my old job and have moved to a company that uses Source safe (we are going to migrate to SVN) and just using VSS has been enough for me to take a serious dislike to it. I went in with an open mind, despite many of my colleagues from my previous job telling me horror stories about VSS I assumed that it would have gotten better since they used it.
Not being able to edit a file because somone else is/was editing it is ridiculous. I've tried to move to more distributed versioning systems like Bazzar which is made by cannonical however it's not mature enough in terms of the tools available.
Source safe gets in the way of development where SVN helps you almost every step of the way.
Plus Using tortoise Svn made code reviews a lot easier.
Only to the extend as you are able to herd a bunch of cats. I've been there twice and in both cases it took some serious problems in Source Safe before people saw the light. As a manager on the other hand I simply directed the team to use SVN and our productivity increased by 300% ( this was working with a group in India and in the US. We had code exchanges that used to take a long time before svn )
Also Trac mounts on top of Subversion. It's free and a great way to view the repository (timeline, wiki, etc)
As you're making these arguments, consider whether you need to address any policy your company may have about using open source tools. See this answer to a prior question: Switching source control
Make them use it and they will switch to something else :)
Now, being serious, tell them its not that hard to use it, many developers that I've known refused to switch because they related subversion to unix and wierd commands, show them interfaces like ToirtoiseSVN or VisualSVN, tell them that Subversion allows them to edit the same file withouth a forced locking like VSS does.
And last but not least, it is Open Source. It has lower cost than buying Team Foundation Server and if you look around you will see that small teams of developers work quite well with SVN.
I used SourceSafe on a small development team and was responsible for keeping it running.
I found the database gets corrupted pretty easily, and there isn't much recourse when that happens. The "repair" feature (as with most any Microsoft repair feature) just doesn't work 98% of the time.
Naturally, when our database became corrupt, we tried to restore from our backup archive. That was when we discovered the other bad thing about SourceSafe: its 2GB archive limit. We were making backups at our office for months before we ever realized that they couldn't be restored and were useless.
SourceSafe is just a disaster waiting to happen.
I'm planning on ditching SourceSafe in the next few weeks, after over a decade of putting up with it. Mostly I've been using it within the context of a small (< 5 person) team, and not had to do a lot of branching because there's been no call to do it.
However, the #1 problem for me, and always has been, is that the damn thing is so prone to corruption - if you have your SS database (lol, database; collection of randomly named files more accurately describes it) on a network drive, and something happens to your LAN connection partway through an add/checkin operation - 9 times out of ten you get "invalid handle" and the damn thing is corrupted in some way, and then you get to play Russian Roulette with the Analyzer tool.
I realised, a couple of months back, that for the past decade I had been making local zipped up copies of the source at every release of the software I was working on, because I didn't trust the source control system. What a waste of time.
So, it's going. I'll probably use Subversion and TortoiseSVN, because I think the team will need a UI to ease the transition.