Optimizing Build in ClearCase Dynamic View - performance

I'm trying to optimize my workflow as I still spend quite some time waiting for the computer when it should be the other way 'round IMO.
I'm supposed to hand in topical branches implementing a single feature or fixing a single bug, along with a full build log and regression test report. The project is huge, it takes about 30 minutes to compile on a fairly modern machine when compiling in a snapshot view.
My current workflow thus is to do all development work in a single snapshot view, and when a feature is ready for submission, I create a new dynamic view, merge the relevant changes from the snapshot and start the build/testing procedure overnight.
In a dynamic view, a full build takes about six hours, which is a major PITA, so I'm looking for a way to improve these figures. I've toyed with the cache settings, but that doesn't seem to make much difference. I'm currently pondering writing a script that will create a snapshot view with the same spec as the dynamic view, fetch the files into it and build there, but before I do that I wonder if there is a better way of improving my build times.
Can I somehow make MVFS cache all retrieved objects locally (I have both lots of harddisk space and RAM), ideally sharing the cache between multiple dynamic views (as I build feature branches, most files are bound to be identical between two different branches)
Is there any other setting I could tune to speed up local builds?
Am I doing it wrong (i.e. is there a better workflow for me, considering that snapshot views take about one hour to create)?

Considering that you can have a dynamic view and a snapshot view with the same config spec, I would really recommend:
having a dynamic view ready for merge operation
then, once the merge is done, updating your snapshot view (no need to recreate it from scratch, which takes too much time. Just launch an update)
That way, you get the best of both world:
easy and quick merges within the dynamic view
"fast"(er) compilation within the snapshot view dedicated for that step.
Even if the config spec might have to change in your case (if you really have to use one view per branch), you still can change the config spec of an existing snapshot view (and still benefit from an incremental update), rather than recreating a snapshot view for each branch you need to compile on.

Related

Why do software updates exist?

I know this may sound crazy but hear me out...
Say you have a game and you want to update it (add new features / redecorate for seasonal themes / add LTMs etc.) Now, instead of editing your code and then waiting days for your app market provider (Google/Microsoft/Apple etc.) to approve the update and roll out the changes, why not:
Put all of your code into a database
Remove all of your existing code from your code files
Add code which can run code from a database (reads it in and eval()s it)
This way, there'd be no need for software updates unless you wanted to change your database-related code, and you could simply update your database to change what the app does when it's live.
My Question: Why hasn't this been done?
For example:
Fortnite (a real game) often has LTMs (Limited Time Modes) which are available for a few weeks and are then removed. Generally, the software updates are ~ 5GB and take a lot of time unless your broadband is fast. If the code was fetched from a database and then executed, there'd be no need for these updates and the changes could be instantaneous.
EDIT: (In response to the close votes)
I'm looking for facts and statistics to back up reasons rather than just pure opinions. Answers like 'I think this would be good/bad ... ' aren't needed (that's why there's comments); answers like are 'This would be good/bad as this fact shows that ...' are much better and desired.
There are few challenges in your suggested approach.
putting everything in database will make increase the size of db. But that doest affect much.
If all the code is there in db, its possible to decompile your software and get a way to connect to your code?
too much performance overhead of evals. Precompiled code is optimized for their respective runtime
multiple versions? What to do when you want to have multiple versions of your software?
The main reason updates exists is they are easy to maintain, flexible and allows the developer to ship fastest optimized tool.
Put all of your code into a database
Remove all of your existing code from your code files
Add code which can run code from a database (reads it in and eval()s it)
My Question: Why hasn't this been done?
This is exactly how every game works already.
Each time you launch a game, an executable binary game engine (which you describe in step 3) already reads the rest of your code (often some embedded language like LUA) from a "database" (the file system) and "evals" (interprets) it to run the game, as well as assets like level geometry, textures, sounds and music.
You're talking about introducing a layer of abstraction (a real database) between the engine and its data to hide some of your assets, but the database stores its data on the file system, so you really haven't gained anything, you've just changed the way the data is encoded at rest and queried during runtime, and introduce a ton of overhead in both cases.
On the other hand, you're intentionally cheating your way through the app reviewal process this way, and whatever real technical problems you would have are moot, because your app would not be allowed in the app store. The entire point of the app reviewal process is to prevent people from shipping unverified unreviewed code to users, and if your program is obviously designed to circumvent this, your app will be rejected.
Fortnite (a real game) often has LTMs (Limited Time Modes) which are available for a few weeks and are then removed. Generally, the software updates are ~ 5GB and take a lot of time unless your broadband is fast.
Fotenite will have a small binary executable that is the game engine. Updates to this binary will account for a fraction of a percentage of that 5GB. The rest will be some kind of interpreted/embedded language describing the game's levels (also a tiny fraction) and then assets which account for the rest (geometry, textures, sound, music).
If the code was fetched from a database and then executed, there'd be no need for these updates and the changes could be instantaneous
This makes no sense. If you move that entire 5GB from the file system into a database, you still have to transfer around 5GB worth of database updates. 5GB of data in a database still lives as 5GB of data on the file system, it's just that you can't access it directly anymore. You have to transfer around the exact same amount of data, regardless of how you store it.

How to speed up OpenGrok indexing

lately I was asked by my boss to explore OpenGrok possibilities in the company I'm working for. First I started with a few projects at my virtualbox lubuntu, it was working ok, but kind of slowly. I blamed my laptop with mediocre parameters for that.
Now I'm having virtual of bigger proportions and I'm also running indexing on larger volume of data (SVN repository - 100 different projects, some of them with multiple branches, tags and trunk, about 100 000 files in total, few GB in size). All files are checked out directly in the SRC_ROOT.
I was hoping for reasonably fast indexing, but it's been running for more than five days now. I can see multiple threads running via htop, but CPU usage is 0.5-2.5%, memory usage 0.9%. So I guess it's not an issue of computing power. And unless there are terribly slow HDDs I don't know what the problem is.
Furthemore the indexing process seems to be slowing down. At the beginning it was approximately 1 sec/file, now it is about 5 sec/file. Unfortunately I haven't triggered the progress option, so I have no idea how long it's still going to run.
Any ideas how to make indexing faster? How to use resources more effectively? Current speed is simply unusable...
I think easy way to improve performance is to run opengrok index with setting up JAVA_OPTS and using 64 bit java.
Also, using derby for storing generated index data increase performance too.
More info about how to use and setup opengrok
https://github.com/OpenGrok/OpenGrok/blob/master/README.txt#L862
https://java.net/projects/opengrok/lists/discuss/archive/2013-03/thread/1#00000
I think the problem is SVN, try to debug and improve speed of SVN access from your VM, or disable(temporarily) svn altogether to get a fast index (and you can add history to index later gradually - per project, even if it will take few days, see options on how to run indexer per project)
Or if you can mirror SVN repo and make local svn calls that should give you a boost too.
So to conclude {OpenGrok can detect svn, skip history creation(enable it on the fly) and just index the checkout and then later add locally history to avoid long waits for history to be generated on the fly.
And that said, git and hg seem to work well with {OpenGrok in terms of history index.
I've been running into this myself, and I've found that the indexer is spending most (>90%) of its time querying the source control systems.
That said, some of the projects I use do use Perforce and SVN, so I don't want to disable them entirely, so what I've done is index twice -- first, with all the options that involve source control disabled, and then again with everything enabled.
That way, it still takes a long time (several days, in my case), but at least I have a usable index up and running in a few hours, and then it can spend days working out all the history.
Subsequent indexes should be faster, as I would expect that the historycache is only updated for files that are newer than the cached history.
(That said, it would be nice if I could update the historycache externally so it's all ready to go before I start the indexer at all, and have the indexer configured to not look up history information at all, but instead to just index what's cached)

Faster ClearCase view labeling for Continuous Integration

I have been optimizing our continuous integration builds, and the remaining bottleneck seems to be the following ClearCase commands:
cleartool.exe mklbtype -nc -ordinary BUILD_ApplicationWorkspace_1.0.0.0#vob_example
For a view with 1800 files, this is taking over 6 minutes to complete. Our MSBuild task takes half that. I am guessing the bulk of the bottleneck is network bandwidth but also how we are labeling the files used in this build.
Baed on this, I have questions:
Are we efficiently labeling the source code files, or is there a more efficient command we can run?
How can I get better metrics to understand where this ClearCase command is spending the bulk of its time?
Do prior labels slow ClearCase labeling down?
Related, does ClearCase have anything similar to Git Sub-modules or svn:externals? Currently we are creating a view of everything, including dependencies, before we do the build.
Thanks for your help.
cleartool mklbtype shouldn't take that long: it is about creating the type of the label, not about applying it on each and every of your file.
If anything, mklabel should takes time.
Applying the UCM methodology (as opposed to your current "Base ClearCase" usage) might help in that:
it forces you to define "components" (coherent groups of files, ie not "1800 files", which is quite large)
it proposes "incremental baseline", which only labels what has changed.
the UCM components are a bit akin to git Submodules (you can group them in a composite baseline), but that is not the same than svn:external, as mentioned here and in "Why are git submodules incompatible with svn externals?".
But if you are stuck with Base ClearCase, you are stuck with labelling everything, and one venue for optimization would be to label only a subset of those files.

Looking for alternatives to the database project

I've a fairly large database project which contains nine databases and one database with a fairly large schema.
This project takes a large amount of time to build and I'm about to pull my hair out. We'd like to keep our database source controlled, but having a hard getting the other devs to use the project and build the database project before checking in just because it takes so long to build.
It is seriously crippling our work so I'm look for alternatives. Maybe something can be done with Redgate's SQL Compare? I think maybe the only drawback here is that it doesn't validate syntax? Anyone's thoughts/suggestions would be most appreciated.
Please consider trying SQL Source Control, which is a product designed to work alongside SQL Compare as part of a database development lifecycle. It's in Beta at the moment, but it's feature complete and it's very close to its full release.
http://www.red-gate.com/products/SQL_Source_Control/index.htm
We'd be interested to know how this performs on a commit in comparison to the time it takes for Visual Studio to build your current Database Project. Do you actually need to build the project so often in VS that it's a problem? How large is your schema and how long is an average build?
Keeping Dev/live db in sync:
There are probably a whole host of ways of doing this, I'm sure other users will expand further (including software solutions).
In my case I use a two fold approach:
(a) run scripts to get differences between db (stored procs, tables, fields, etc)
(b) Keep a strict log of db changes (NOT data changes)
In my case I've over time built up a semi structured log thus:
Client_Details [Alter][Table][New Field]
{
EnforcePasswordChange;
}
Users [Alter][Table][New Field]
{
PasswordLastUpdated;
}
P_User_GetUserPasswordEnforcement [New][Stored Procedure]
P_User_UpdateNewPassword [New][Stored Procedure]
P_User_GetCurrentPassword [New][Stored Procedure]
P_Doc_BulkDeArchive [New][Stored Procedure]
ignore the tabing, the markdown has messed it up.
But you get the general gist.
I find that 99% of the time the log is all I need.

In starteam, how can I find out when a file was deleted, and by whom?

We have a small team running StarTeam. A constant source of frustration and problems is the handling of deleted files in StarTeam. It is obvious that Starteam keeps track of deleted files internally, but it does not seem to be possible to get any information about a file deletion.
So far, my only solution to find the timing of a delete is to perform a manual binary search using the 'compare' views. Is there any better way (the query for 'delete time' never seems to pick up any files).
The Audit tab (just to the right of File, ChangeRequest, etc.) is probably your best bet if you're just looking for who deleted what and when. The Audit tab also provides information about when items and folders were created, shared, or moved, as well as when View labels are attached/detached. Whenever someone has files unexpectedly appear or disappear, I direct them to the Audit tab first.
There is a server-side configuration setting for the length of time the audit data is retained (30 days by default, I believe). Since it is not retained forever, it isn't a good option for historical data. The number of audits can be quite large in active views.
If you're looking for something more than that or older than your audit retention time, go with Bubbafat's suggestion of the SDK and getDeletedTime/getDeletedUserID.
Comparing views (or rolling back a view to see the item again) is the only way I know how to do this in StarTeam without writing code.
If you are willing to write a little code the StarTeam API provides the Item.getDeletedTime and Item.getDeletedUserId methods (I believe these showed up in 2006).

Resources