Faster ClearCase view labeling for Continuous Integration

Faster ClearCase view labeling for Continuous Integration - continuous-integration

I have been optimizing our continuous integration builds, and the remaining bottleneck seems to be the following ClearCase commands:
cleartool.exe mklbtype -nc -ordinary BUILD_ApplicationWorkspace_1.0.0.0#vob_example
For a view with 1800 files, this is taking over 6 minutes to complete. Our MSBuild task takes half that. I am guessing the bulk of the bottleneck is network bandwidth but also how we are labeling the files used in this build.
Baed on this, I have questions:
Are we efficiently labeling the source code files, or is there a more efficient command we can run?
How can I get better metrics to understand where this ClearCase command is spending the bulk of its time?
Do prior labels slow ClearCase labeling down?
Related, does ClearCase have anything similar to Git Sub-modules or svn:externals? Currently we are creating a view of everything, including dependencies, before we do the build.
Thanks for your help.

cleartool mklbtype shouldn't take that long: it is about creating the type of the label, not about applying it on each and every of your file.
If anything, mklabel should takes time.
Applying the UCM methodology (as opposed to your current "Base ClearCase" usage) might help in that:
it forces you to define "components" (coherent groups of files, ie not "1800 files", which is quite large)
it proposes "incremental baseline", which only labels what has changed.
the UCM components are a bit akin to git Submodules (you can group them in a composite baseline), but that is not the same than svn:external, as mentioned here and in "Why are git submodules incompatible with svn externals?".
But if you are stuck with Base ClearCase, you are stuck with labelling everything, and one venue for optimization would be to label only a subset of those files.

Related

How to speed up OpenGrok indexing

lately I was asked by my boss to explore OpenGrok possibilities in the company I'm working for. First I started with a few projects at my virtualbox lubuntu, it was working ok, but kind of slowly. I blamed my laptop with mediocre parameters for that.
Now I'm having virtual of bigger proportions and I'm also running indexing on larger volume of data (SVN repository - 100 different projects, some of them with multiple branches, tags and trunk, about 100 000 files in total, few GB in size). All files are checked out directly in the SRC_ROOT.
I was hoping for reasonably fast indexing, but it's been running for more than five days now. I can see multiple threads running via htop, but CPU usage is 0.5-2.5%, memory usage 0.9%. So I guess it's not an issue of computing power. And unless there are terribly slow HDDs I don't know what the problem is.
Furthemore the indexing process seems to be slowing down. At the beginning it was approximately 1 sec/file, now it is about 5 sec/file. Unfortunately I haven't triggered the progress option, so I have no idea how long it's still going to run.
Any ideas how to make indexing faster? How to use resources more effectively? Current speed is simply unusable...

I think easy way to improve performance is to run opengrok index with setting up JAVA_OPTS and using 64 bit java.
Also, using derby for storing generated index data increase performance too.
More info about how to use and setup opengrok
https://github.com/OpenGrok/OpenGrok/blob/master/README.txt#L862
https://java.net/projects/opengrok/lists/discuss/archive/2013-03/thread/1#00000

I think the problem is SVN, try to debug and improve speed of SVN access from your VM, or disable(temporarily) svn altogether to get a fast index (and you can add history to index later gradually - per project, even if it will take few days, see options on how to run indexer per project)
Or if you can mirror SVN repo and make local svn calls that should give you a boost too.
So to conclude {OpenGrok can detect svn, skip history creation(enable it on the fly) and just index the checkout and then later add locally history to avoid long waits for history to be generated on the fly.
And that said, git and hg seem to work well with {OpenGrok in terms of history index.

I've been running into this myself, and I've found that the indexer is spending most (>90%) of its time querying the source control systems.
That said, some of the projects I use do use Perforce and SVN, so I don't want to disable them entirely, so what I've done is index twice -- first, with all the options that involve source control disabled, and then again with everything enabled.
That way, it still takes a long time (several days, in my case), but at least I have a usable index up and running in a few hours, and then it can spend days working out all the history.
Subsequent indexes should be faster, as I would expect that the historycache is only updated for files that are newer than the cached history.
(That said, it would be nice if I could update the historycache externally so it's all ready to go before I start the indexer at all, and have the indexer configured to not look up history information at all, but instead to just index what's cached)

Does Number of commits can affect svn performance

I was about to commit about 1000 files at once after few refactoring stuff. Is it advisable to commit such huge number of files or I should commit them in batches. I am trying to look at pros and cons sort of.
One of the pros is that I will have same entry in the SVN for all my changes and will be easy to navigate.

With a number of files as small as 1000, I would worry less about performance and more about correct work flow. 1000 files is a lot of files and thus a lot of changes, but Subversion should handle it reasonably well.
However, if all of the changes are not actually 1 change, then it should not be one commit. For example, if you're renaming 3 functions, I would make each rename a separate commit. Depending on what specifically you're doing, you may be able to get away with one commit, but a year from now when you're browsing through the logs, you'll make life easier on yourself if you tend to stick to small commits. If it really is only one change, then one commit is definitely your best option (for example, renaming one function).

SVN can handle 1000 files at once. The only reason to check in batches is to give each batch a different commit message, like "fixed bug #22" and "added flair".

the number of files doesnt really matter.
when you commit changes to your code repo, you should be thinking of build statiblity and test compliance.
That answers your question: If you have made changes to n files and only commit some of them, then you're likely to break the build (not even talking abt the tests). So you should commit all necessary files to guarantee build integrity at least.
svn and other tools are well capable of dealing with such nb of files, which will represent a single transaction on the server.

Optimizing Build in ClearCase Dynamic View

I'm trying to optimize my workflow as I still spend quite some time waiting for the computer when it should be the other way 'round IMO.
I'm supposed to hand in topical branches implementing a single feature or fixing a single bug, along with a full build log and regression test report. The project is huge, it takes about 30 minutes to compile on a fairly modern machine when compiling in a snapshot view.
My current workflow thus is to do all development work in a single snapshot view, and when a feature is ready for submission, I create a new dynamic view, merge the relevant changes from the snapshot and start the build/testing procedure overnight.
In a dynamic view, a full build takes about six hours, which is a major PITA, so I'm looking for a way to improve these figures. I've toyed with the cache settings, but that doesn't seem to make much difference. I'm currently pondering writing a script that will create a snapshot view with the same spec as the dynamic view, fetch the files into it and build there, but before I do that I wonder if there is a better way of improving my build times.
Can I somehow make MVFS cache all retrieved objects locally (I have both lots of harddisk space and RAM), ideally sharing the cache between multiple dynamic views (as I build feature branches, most files are bound to be identical between two different branches)
Is there any other setting I could tune to speed up local builds?
Am I doing it wrong (i.e. is there a better workflow for me, considering that snapshot views take about one hour to create)?

Considering that you can have a dynamic view and a snapshot view with the same config spec, I would really recommend:
having a dynamic view ready for merge operation
then, once the merge is done, updating your snapshot view (no need to recreate it from scratch, which takes too much time. Just launch an update)
That way, you get the best of both world:
easy and quick merges within the dynamic view
"fast"(er) compilation within the snapshot view dedicated for that step.
Even if the config spec might have to change in your case (if you really have to use one view per branch), you still can change the config spec of an existing snapshot view (and still benefit from an incremental update), rather than recreating a snapshot view for each branch you need to compile on.

Looking for alternatives to the database project

I've a fairly large database project which contains nine databases and one database with a fairly large schema.
This project takes a large amount of time to build and I'm about to pull my hair out. We'd like to keep our database source controlled, but having a hard getting the other devs to use the project and build the database project before checking in just because it takes so long to build.
It is seriously crippling our work so I'm look for alternatives. Maybe something can be done with Redgate's SQL Compare? I think maybe the only drawback here is that it doesn't validate syntax? Anyone's thoughts/suggestions would be most appreciated.

Please consider trying SQL Source Control, which is a product designed to work alongside SQL Compare as part of a database development lifecycle. It's in Beta at the moment, but it's feature complete and it's very close to its full release.
http://www.red-gate.com/products/SQL_Source_Control/index.htm
We'd be interested to know how this performs on a commit in comparison to the time it takes for Visual Studio to build your current Database Project. Do you actually need to build the project so often in VS that it's a problem? How large is your schema and how long is an average build?

Keeping Dev/live db in sync:
There are probably a whole host of ways of doing this, I'm sure other users will expand further (including software solutions).
In my case I use a two fold approach:
(a) run scripts to get differences between db (stored procs, tables, fields, etc)
(b) Keep a strict log of db changes (NOT data changes)
In my case I've over time built up a semi structured log thus:
Client_Details [Alter][Table][New Field]
{
EnforcePasswordChange;
}
Users [Alter][Table][New Field]
{
PasswordLastUpdated;
}
P_User_GetUserPasswordEnforcement [New][Stored Procedure]
P_User_UpdateNewPassword [New][Stored Procedure]
P_User_GetCurrentPassword [New][Stored Procedure]
P_Doc_BulkDeArchive [New][Stored Procedure]
ignore the tabing, the markdown has messed it up.
But you get the general gist.
I find that 99% of the time the log is all I need.

Can Visual Studio (should it be able to) compute a diff between any two changesets associated with a work item?

Here is my use case:
I start on a project XYZ, for which I create a work item, and I make frequent check-ins, easily 10-20 in total. ALL of the code changes will be code-read and code-reviewed.
The change sets are not consecutive - other people check-in in-between my changes, although they are very unlikely to touch the exact same files.
So ... at the end of the project I am interested in a "total diff" - as if there was a single check-in by me to complete the entire project. In theory this is computable. From the list of change sets associated with the work item, you get the list of all files that were affected. Then, the algorithm can aggregate individual diffs over each file and combine them into one. It is possible that a pure total diff is uncomputable due to the fact that someone else renamed files, or changed stuff around very closely, or in the same functions as me. I that case ... I suppose a total diff can include those changes by non-me as well, and warn me about the fact.
I would find this very useful, but I do not know how to do t in practice. Can Visual Studio 2008/2010 (and/or TFS server) do it? Are there other source control systems capable of doing this?
Thanks.

You can certainly compute the 'total diff' yourself - make a branch of the project from the revision just prior to your first commit, then merge all your changesets into it.
I don't think this really is a computable thing in the general case - only contiguous changesets can be merged automatically like this. Saying it's 'unlikely' for others to have touched the files you're working on in the interleving commits doesn't cut it, you need guarantees to be able to automate this sort of thing.

You should be working on a branch of your own if you want to be able to do this easily.
The ability to generate diff information for display or for merge purposes is functionality provided by your version control system, as Mahesh Velaga commented on another answer. If you were able to compute the diff by cherry-picking non-contiguous changesets, then logically you would also be able to merge those changes in a single operation. But this is not supported by TFS. So I strongly suspect that the construction of the cherry-picked diff information is also not supported by TFS. Other version control systems (git, mercurial, darcs come to mind) might have more support for something like this; I don't know for sure.
From my reading of their answers on the TFS version control forums, I think that their recommendation for this would be to create a branch of your own for doing this work in the first place: then the changesets would be contiguous on that branch and creating the "total diff" would be trivial. Since it sounds like you are working on an independent feature anyway (otherwise a diff of only your changes would be meaningless), you should consider having an independent branch for it regardless of whether your version control system is TFS or something else.
The alternative is to construct what such a branch would have looked like after the fact, which is essentially what Jim T's answer proposes. You might prefer that approach if your team is very keen on everyone working in the same kitchen, as it were. But as you are already aware, things can get messy that way.

Create two workspaces. Get Specific Version for files specifying the date or upto those two changeset on those two workspace. Now compare folders using a compare tool. Araxis merge is best one.

sounds like you need a tool that supports changesets (changes over multiple files and committing them all at once) instead of committing each file alone
take a look at this comparison between sourcesafe and mercurial ( free and you can find tools to integrate it with visual studio )

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio