Best Practice: Removing obsolete artifacts from UCM ClearCase - clearcase-ucm

We have a stream in ClearCase UCM. We create Views on this stream and fetch code for Build purpose. The total data copied is 10 GB. This is a huge codebase. I decided to investigate what makes it huge.
I found:
1) Multiple versions of Third Party applications are stored in
ClearCase
2) But only the latest Third Party applications are used by our
application
3) Lots of obsolete and redundant code is available
I proposed:
1) Removal of old versions of Third Party applications using rmname
(NOT rmelement) which will ensure the availability of element history
2) Removal of all redundant code
A total of 5 GB of obsolete data has been detected.
My Logic:
I think this is the best way to keep a stream of development clean. That is, the best way to organize a stream of development is to have the best, the cleanest and the leanest source code available.
Also, since all HISTORY will be available always in ClearCase, there is no need to panic about the deletion of elements.
I feel old, redundant and obsolete code and artifacts belong in HISTORY and not in the current stream of development.
Lastly, I feel ClearCase operations like making a baseline etc will take more time if we have bloat in the VOB. Since we do an incremental baseline for nightly builds, I do not think these obsolete items are baselined. But I feel all ClearCase operations are affected by bloat.
Is my LOGIC proper? Is my understanding of UCM ClearCase proper?
*Please let me know the best practice in such cases.*
People at my work place do not want to delete the obsolete files although 5 GB data is obsolete in the current stream.
Any help would be appreciated.

The best practice is actually separate from UCM in this case.
I too started by storing third-party binaries in ClearCase. It didn't scale well and the Vob started to get bloated, and simply too large to be managed (ie backed up) easily.
I now prefer storing third-parties in an artifact repository like Nexus, and add a little maven script to my build process in order to download the right binaries at the right versions, as declared in a pom.xml file.
Note that to remove old versions of a binaries from a vob, rmelem or rmver are really not advisable (risk of hyperlink corruption), but I used to do:
cleartool rmver -data aLargeBinary#/main/.../branch/OldVersion
That would keep the version in ClearCase, but would remove the version content (ie the large binary itself): that allowed for the Vob to get much smaller.
That being said, I agree with your general policies (especially regarding redundant code)

Related

Reintegrate a branch back to the trunk when sweeping changes have been made to the tree structure

A brief note before I start: there is a lot of explanation required to "set the stage", and it may seem like this is more of a design question than a question about a programming problem. The question is actually about SVN branching and merging, so please read to the end.
Scenario:
I have a large Visual Studio solution with quite a few projects. I'm using SVN, so of course the trunk has my production line of development. This consists of a core DLL assembly, a "main" UI user client, and a handful of "plugin" assemblies that operate by implementing interfaces on the core assembly in order to provide functionality within the UI, and also by utilizing a set of service methods which provide common functionality to all of the plugins (such as persistence logic operations, storage operations for a centralized file store architecture, etc.)
There are also external utilities that I have built over time which must duplicate a lot of the business logic in the plugins. I won't go into much detail because it will ultimately distract from my main question, but just picture, for example, a scheduled service on a server that handles centralized maintenance operations related to a particular plugin's data.
When I initially built this application, I (stupidly) didn't anticipate the need for centralized service tiers, so I architected the core assembly (for better or worse), as shown above, to be tightly integrated with the presentation layer of the application. In other words, the UI presentation logic needed to integrate the plugins with the user interface and the business logic needed by the plugins to perform common plugin logic operations is all part of the one "core" assembly. Therefore, much of the "shared" logic that exists between the plugins and the centralized services has resulted in duplicated code.
I decided to undertake the major refactoring initiative to pull out the common logic -- that which is not related to the presentation -- into a "shared" assembly. For this, I created a branch off the trunk. I reorganized common code into a "shared" assembly, and I re-pointed everything in the client application (plugins, etc.) and the external service applications to utilize the shared assembly. In many cases, I also had to rename classes in order to fit their more-general purpose going forward. The core assembly remained in place only to broker presentation-layer responsibilities between the plugins and the UI.
Problem:
Now that I have successfully completed the refactoring, I want to reintegrate the branch back into the trunk. Merging is tricky business even in simple cases, but what I'm facing here is a lot of tree conflicts to put it mildly. Also, in addition to residing in an entirely new project, the folder structure in the "shared" project is quite a bit different from what it was in the "core" project. Classes are, in many cases, located in different places due to the new mechanisms for using the shared assembly.
I want to maintain the version history of every class from its old home in the core assembly to its new home in the shared assembly. Furthermore, I want to guarantee that the merge is successful. That seems obvious, but in testing a miniature version of this whole scenario, I was never able to get the conflicts to resolve in such a way where my branch features remained entirely intact. Furthermore, the fact that I have renamed some of the classes, as I stated earlier, to suit their more-general roles, makes it very tricky to maintain the version history.
I will note that I am using AnkhSVN which helps in "normal" cases when you rename files to repair the moves, but it doesn't seem to work in these major tree-conflict cases. Also, I know there is a difference in how merges work between different versions of SVN -- I believe it's pre-SVN 1.5 and post-SVN 1.5. I'm using SVN 1.9.3.
I have been trying to figure this out for a few weeks now. I've been pouring through the SVN book, TortoiseSVN resources like this, and anything I could find from google searches, like this, this, and this -- among many, many, many others. I feel like I'm going crazy and I think advanced SVN (and Tortoise) are impossible to learn with the traditional teach-yourself, learn-from-the-web-and-books approach. At any rate, I would greatly appreciate any insight that is out there.
What is the proper methodology when you create a feature branch using SVN and plan on making major tree changes and "moves" (i.e. renames) so that you can reintegrate those changes with the trunk without losing anything?
Congratulations to stepping on the most "popular" rake in SVN - "Merge Hell after refactoring"!
There are (at least) two simple rules for your case, produced by the bitter experience:
Never perform refactoring in SVN
If you'll ignore rule 1: in the name of all that is holy and good in the world don't touch ANYTHING in trunk during refactoring in branch
If you reject these the righteous covenants you still have a ways to salvation
Pure SVN-way, long and dirty
Merge all and every subtree, which is source of Tree Conflict, determining by hands every source and target like
svn merge NEW_PATH/NEW_NAME old_path/old_name
and finalize this the bloody work by full merge
Tricky Mercurial-way (or Git-way, but I just hate Git)
Preface: such merges aren't problem at all for modern DVCSes, they have "bridges" to SVN-repos, thus - you can delegate this job of merging to external VCS of choice and return results back (with some limitations and warnings)
I'm too lazy to speak about all DVCSes and will explain only about Mercurial (considering that with SVN-background it will be the least painful migration).
With HGSubversion Mercurial can read (pull) and write (push) to Subversion repositories, but - it can't push to Subversion results of it's own merges, thus: it will be multi-stage operation with the substitution of WC of Subversion in the process
A brief synopsis
Install Mercurial (TortoiseHG) and HGSubversion extension
Clone the whole SVN-repository to Mercurial into some temporary location (not current Subversion WC)
Merge branch to mainline (SVN's trunk become default branch), resolve (possible) context-conflicts (not tree)
Test (?) results
Perform the full replacement of Subversion Working Copy (WC of trunk, obviously) by the content of Mercurial Working Directory (beware of .svn and .hg folders respectively)
Commit WC to trunk
For the beauty and compliance with all rules "cheat" mergeinfo data of trunk (committed in step 6 must me known later as mergeset, although it is not true formally)
HTH
PS - migration to Mercurial with HGVS doesn't seems as totally crazy idea for now

How to implement "Lock & Edit" mechanism for Visual Studio? GitHub, SVN, VSS, TFS?

Here's the requirement:
C# classes need to be shared among a group of 5 developers.
If one developer starts editing a class, it should be automatically locked for others
Others can edit that class, only when the current developer releases the class
I understand that Git is a distributed version control system, whereby complete local repositories are created. Merge functionality has to be used for creating a consolidated file.
I have also tried Svn, but even that uses a Merge tool.
I have a small team, and I don't want to use Merge Tools. Which is the best way to accomplish this?
SVN does support this kind of workflow with its locking feature.
Read the section on locking in the SVN Book v 1.7 - it goes into plenty of detail.
As far as Im aware git does not support a locking workflow.
Apparently Team Foundation Server also supports a locking workflow, but I'm not familiar with it.
I will add that i do not think this a good way to work unless you absolutely have to (eg binaries or hard to merge files like model xml). Regular team communication and defensive programming should mean that the vast majority of code merges will be handled automatically by your version control system.
Merging is just a part of collaborative development. Nobody really wants to use merge tools, but IMO having to do an occasional (sometimes messy) merge is a far better prospect that having to wait until someone else is finished with a file before I can make my change - changes which are very likely NOT to conflict with others changes anyway. Especially in a small team.
You should also not (as mentioned in comments above) need a resource dedicated to Merging. A merge is best done by two people.
The developer with the conflict, and
The developer who committed the last change (that has caused that conflict.)
If these two can't work it out pretty quickly, or you really do need a resource just for merging (which I have seen occur even in smallish teams of around 10 developers) you have problems.. such as;
The code is monolithic/highly coupled and needs refactoring
The developers are not committing atomic changes.
Using svn and a complex branching strategy (scary)
Developers are not talking to each other (Just a 10 min standup/day would help)
Good luck!
Apache Subversion 1.8 features major improvements that make merging and solving conflicts easier. New automatic merges are definitely worth testing!
As #mounds already mentioned, you can use pessimistic locking kind of workflow with Apache Subversion. See the SVNBook | Lock communication section. In such case Visual Studio with VisualSVN will prompt you to lock a file before you start modifying it.
Note that such approach should be used with those files that can't be merged. So~, Embrace Merge!
Users and administrators alike are encouraged to attach the
svn:needs-lock property to any file that cannot be contextually
merged. This is the primary technique for encouraging good locking
habits and preventing wasted effort.

Subversion very slow

I am working on a project where the branches folder contain at least 300 different branch (copy of the trunk) which is will no more be used. Since SVN is running more and more slowly I wonder if deleting those branches will make subversion behave operate faster?
Other people in my team say that since the source code will still be on the server, so it wont change anything. (So branch stay undeleted).
But I read something on Subversion before (I dont remember where) saying that HEAD is managed a little bit different that previous version which could increase the speed of the repository.
Which one of these hold true ?
Subversion performance is more related to the load on your server than the size of the repository. Check on disk space and CPU performance, as well as looking into the web server performance (or svnserve on Windows).
If you remove branches, there will still be a repository version that has those branches in it, so they will not be removed. The only way to actually remove content is to dump the repository (svnadmin dump) and then use svndumpfilter to remove the branches in question from the dumped content. The resulting content can be loaded into a new repository without the removed content, and even the revision numbers can be updated.
I am not aware of the HEAD being handled differently in terms of performance. However, copies of the HEAD (or anything else) are cheap, lightweight copies, and should not affect performance.
Can you provide any additional information on which specific operations are slowing down?

How do you manage .vcproj files in source control which are changed by multiple developers?

We use Subversion as our source control system and store the VisualStudio project files (vcproj) in the source control system as is normal I think. With Subversion we don't use any form of file locking, so if two developers are working on the same project at the same time and both add files to the project, or change settings, the second one to commit has to merge the changes.
How do you merge these changes?
The vcproj files are just text files so it is possible to edit them by hand but they are not very amenable to hand editing, especially by junior developers.
The ways I can think of are
Get the latest version from svn and re-add all local changes manually
Edit the file by hand to resolve any conflicts from an automatic merge
Implement some form of locking scheme to prevent simultaneous changes
Have an agreement between developers so they do not make simultaneous changes
Currently we are using the first option of re-adding all changes manually but this is time consuming and I was wondering if there is a better way.
With source files the automatic merge feature works most of the time and we don't get many conflicts.
I've found that option 2 (edit the files by hand) generally works fairly well, as long as you're using a good diff tool (I use WinMerge). The main problem I've run into is that Visual Studio will sometimes reorder the file. But, if you have a good diff/merge tool then it should be able to differentiate between changed content and moved content. That can help a lot.
You might find Project: Merge or Tools for SLN file useful
This is a tough problem and I think a weakness in the Visual Studio architecture. The way we found round it was to not have the proj files in source control at all and to have a build script that handled the configuration settings.
The alternative was very messy and we could not guarantee consistent builds or environments between developers. This led to a huge number of downstream integration problems and eventually we took the draconian step of removing the project files from source control.
The developers environments could still become misaligned but it showed up when they tried to build things themselves.
Using TFS here, but I don't think it makes a difference.
We also don't lock, and sometimes have to deal with merging project files. I've never found it to be that complex or much of an issue. Rarely do we ever experience issues that can't be merged automatically, and the manual merge process is pretty much trivial.
There's only one caveat to this: Check in often! If you make major changes to the project structure and don't check them in immediately those changes can start compounding the complexity of later merges. If I make a major change to the structure of a project, I usually give everybody a heads up. I'll ask them all to check in their current work, and then take care of the merge myself.
I found this recently: http://www.codeproject.com/KB/macros/vcproj_formatter.aspx
If you run this tool on a vcproj file and on a modified version of it then you can merge them together easily with your favorite text merge tool, and in addition the result is a more compact pretty vcproj file.
Options 1 and 2 are not mutually exclusive - if the developer is junior level, let them use option 1 (re-get the project file and re-do the changes) if that's more comfortable for them. For more senior developers, option 2 (merge using a merge tool) is perfectly fine.
I think this is a situation that currently has no magic bullet - sometimes merging is a pain.
We use a diff tool (WinMerge) to merge changes. The project files are (for the most part) really straight-forward XML. The key here, though, is that there never should be any surprises when merging, because good communication is part of the bed-rock of effective source control.
Simultaneous changes to the project are perfectly fine as long as people communicate.

Lightweight version control for small projects (prototypes, demos, and one-offs) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Background
I work on a lot of small projects (prototypes, demos, one-offs, etc.). They are mostly coded in Visual Studio (WPF or ASP.NET with code written in C#). Usually, I am the only coder. Occasionally, I work with one other person. The projects come and go, usually in a matter of months, but I have a constantly evolving set of common code libraries that I reuse.
The problem
I've tried to use source control software before (SourceGear Vault), but it seemed like a lot of overhead when working on a small project, especially when I was the only programmer. Still, I would like some of the features that version control offers.
Here's a list of features I'd like to have:
Let me look at any file in an older version of my project instantly. Please don't force me through the rigmarole of (1) checking in my current work, (2) reverting my local copy to the old version, and (3) checking the current version back out so I can once again work on it.
In fact, if I'm the only one on the project, I don't ever want to check out. The only thing I want to be able to do is say, "Please save what I have now as version 2.5."
Store my data efficiently. If I have 100 Mb of media in my project, I don't want that to get copied with every new version I release. Only copy what changes.
Let me keep my common library code files in a single location on my hard drive so that all my current projects can benefit from any bug fixes or improvements I make to my library. I don't want to have to keep copying my library to other projects every time I make a change.
However, do let me go back in time to any version of any project and see what the source code (including the library code) looked like at the time that version was released.
Please don't make me store a special database server on my machine that makes my computer take longer to start up and/or uses resources when I'm not even programming.
Does this exist?
If not, how close can I get?
Edit 1: TortoiseSVN impressions
I did some experimenting with Subversion. A couple observations:
Once you check something in to a repository, it does stuff to your files. It puts these hidden .svn folders inside your project folders. It messes with folder icons. I'm still yet to get my project back to "normal". Unversion a working copy got me part of the way there, but I still have folders with blue question mark icons. This makes me grumpy :-/ Update: finally got rid of the folder icons by manually creating new folders and copying the folders over. (Not good.)
I installed the open source plugin for Visual Studio (AnkhSVN). After creating a fresh repository in my hard drive, I attempted to check in a solution from Visual Studio. It did exact what I was afraid it would do. It checked in only the folders and files that are physically (from the POV of the file system) inside my solution folder. In order to accomplish item #5 above, I need all source code used by solution to be check in. I attempted to do this by hand, but it wasn't a user friendly process (for one thing, when I selected multiple library projects at once and attempted to check them in, it only appeared to check in the first one). Then, I started getting error dialogs when I tried to check in subsequent projects.
So, I'm a little frustrated with SVN (and its supporting software) at this point.
Edit 2: TortoiseHG impressions
I'm trying out Mercurial now (TortoiseHG). It was a little bit difficult to figure out at first, no better or worse than TortoiseSVN I'd say. I noticed an RPC Server on startup (relates to item 6). I figure it should be possible to turn this off if I'm not sharing anything with anyone, but it wasn't something I could figure out just by looking at the options (will check out the help later).
I do appreciate having my local repository as just a single .hg folder. And, simply throwing the folder in the Recycle Bin seemed to be all I needed to do to return everything back to normal (i.e., unversion my project). When I check in (commit), it seems to offer a simple comment window only. I thought maybe there would be a place to put version numbers.
My (probably not very clever) attempt to add a Windows shortcut (a folder aliasing my library projects) failed, not that I really thought it would work :) I thought maybe this would be a sneaky way to get my library projects (currently located elsewhere) included in the repository. But no. Maybe I'll try out "subrepos", but that feature is under construction. So, iffy that I'll be able to do items 4 and 5 without some manual syncing.
Any of the distributed source control solutions seem to match your requirements. Take a look at bazaar, git or mercurial (already mentioned above). Personally I have been using bazaar since v0.92 and have no complaints.
Edit: Heck, after looking at it again, I'm pretty sure any of those 3 solutions handles all 6 of your requested features.
Distributed Version Control Systems (Mercurial, Bazaar, Git) are nice in that they can be completely self-contained in a single directory (.hg, .bzr, .git) in the top of the working copy, where Subversion uses a separate repository directory, in addition to .svn directories in every directory of your working copy.
Mercurial and Subversion are probably the easiest to use on Windows, with TortoiseHG and TortoiseSVN; the Bazaar GUIs have also been improving. Apparently there is also TortoiseGit, though I haven't tried it. If you like the command line, Easy Git seems to be a bit nicer to use than the standard git commands.
I'd like to address point 4, common libraries, in more detail. Unfortunately I don't think any of them will be too easy to use, since I don't think they're directly supported by GUIs (I could be wrong). The only one of these I've actually used in practice is Subversion Externals.
Subversion is reasonably good at this job; you can use Externals (see the chapter in the SVN book), but to associate versions of a project with versions of a library you need to "pin" the library revision in the externals definition (which is itself versioned, as a property of the directory).
Mercurial supports something similar, but both solutions seem a bit immature: subrepository support built-in to the latest version and the "Forest Extension".
Git has "submodule" support.
I haven't seen anything like sub-respositories or sub-modules for Bazaar, unfortunately.
I think Fog Creek's new product, Kiln, will get you pretty close. In response to your specific points:
This is easily done through the web interface -- you don't need to touch your local copy or update. Just find the file you want, click the revision you want to see, and your code will be in front of you.
I'm not sure you can do things exactly like "Please save this as version 2.5", but you can add unique tags to changesets that allow you to identify a special revision (where "special" can mean whatever it wants to you).
Mercurial does a great job of this already (which Kiln uses in the back end), so there shouldn't be any problems in this regard.
By creating different repositories, you can easily have one central 'core' section which is consistent across various projects (though I'm not entirely sure if this is what you're talking about).
I think most version control systems allow you to do this...
Kiln is hosted, so there's no hit on performance to your local machine. The code you commit to the system is kept safe and secure.
Best of all, Kiln is free for up to two licenses by way of their Student and Startup Edition (which also gets you a free copy of FogBugz).
Kiln is in public beta right now -- you can request your account at my first link -- and users are being let as more and more problems are already resolved. (For some idea of what current beta users are saying, take a look at the Kiln Knowledge Exchange site that's dedicated to feedback.)
(Full Disclosure: I am an intern currently working at Fog Creek)
For your requirements I would recommend subversion.
Let me look at any file in an older version of my project instantly. Please don't force me through the rigmarole of (1) checking in my current work, (2) reverting my local copy to the old version, and (3) checking the current version back out so I can once again work on it.
You can use the repository browser of Tortoise Svn to navigate to every existing version easily.
In fact, if I'm the only one on the project, I don't ever want to check out. The only thing I want to be able to do is say, "Please save what I have now as version 2.5."
This is done by svn copy . svn://localhost/tags/2.5.
Store my data efficiently. If I have 100 Mb of media in my project, I don't want that to get copied with every new version I release. Only copy what changes.
Given by subversion.
Let me keep my common library code files in a single location on my hard drive so that all my current projects can benefit from any bug fixes or improvements I make to my library. I don't want to have to keep copying my library to other projects every time I make a change.
However, do let me go back in time to any version of any project and see what the source code (including the library code) looked like at the time that version was released.
Put your libraries into the same svn repository as your remaining code and you'll have global revision numbers to switch back all to a common state.
Please don't make me store a special database server on my machine that makes my computer take longer to start up and/or uses resources when I'm not even programming.
You only have to start svnserve to start a local server. If you only work on one machine you can even do without this and use your repository directly.
I'd say that Mercurial along with TortoiseHg will do what you want. Of course, since you don't seem to be requiring much, subversion with TortoiseSvn should serve equally well, if you only ever work alone, though I think mercurial is nicer for collaboration.
Mercurial:
hg cat --rev 2.5 filename (or "Annotate Files" in TortoiseHg)
hg commit ; hg tag 2.5
Mercurial stores (compressed) diffs (and "keyframes" to avoid having to apply ten thousand diffs in a row to find a version of a file). It's very efficient unless you're working with large binary files.
Symlink the library into all the projects?
OK, now that I read this point I'm thinking Mercurial's Subrepos are closer to what you want. Make your library a repository, then add it as a subrepository in each of your projects. When your library updates you'll need to hg pull in the subrepos to update it, unfortunately. But then when you commit in a project Mercurial will record the state of the library repo, so that when you check out this version later to see what it looked like you'll get the correct version of the library code.
Mercurial doesn't do that, it stores data in files.
Take a look on fossil, its single exe file.
http://www.fossil-scm.org
As people have pointed out, nearly any DVCS will probably serve you quite well for this. I thought I would mention Monotone since it hasn't been mentioned already in the thread. It uses a single binary (mtn.exe), and stores everything as a SQLite database file, nothing at all in your actual workspace except a _MTN directory on the top level (and .mtn-ignore, if you want to ignore files). To give you a quick taste I've put the mtn commands showing how one carries out your wishlist:
Let me look at any file in an older version of my project instantly.
mtn cat -r t:1.8.0 readme.txt
Please save what I have now as version 2.5
mtn tag $(mtn automate heads) 2.5
Store my data efficiently.
Monotone uses xdelta to only save the diffs, and zlib to compress the deltas (and the first version of each file, for which of course there is no delta).
Let me keep my common library code files in a single location on my hard drive so that all my current projects can benefit from any bug fixes or improvements I make to my library.
Montone has explicit support for this; quoting the manual "The purpose of merge_into_dir is to permit a project to contain another project in such a way that propagate can be used to keep the contained project up-to-date. It is meant to replace the use of nested checkouts in many circumstances."
However, do let me go back in time to any version of any project and see what the source code (including the library code) looked like at the time that version was released.
mtn up -r t:1.8.0
Please don't make me store a special database server on my machine
SQLite can be, as far as you're concerned, a single file on your disk that Monotone stores things in. There is no extra process or startup craziness (SQLite is embedded, and runs directly in the same process as the rest of Monotone), and you can feel free to ignore the fact that you can query and manipulate your Monotone repository using standard tools like the sqlite command line program or via Python or Ruby scripts.
Try GIT. Lots of positive comments about it on the Web.

Resources