TortoiseHg is Slow - performance

Basically, what it says on the tin: TortoiseHg is slow.
My team moved from Subversion to Mercurial recently. (In part to take advantage of Kiln for Code Reviews) One of the things we've noticed is that interacting with Mercurial through TortoiseHg is painfully slow. Some stats:
Open TortoiseHg Workbench: 8 minutes 13 seconds
Response time when clicking on a revision: 2.8 seconds
Time to "Refresh Current Repository": 6.4 seconds
Time to check for incoming changes: 12.8 seconds
All this really adds up to a very slow feeling application. For reference, here are the command line tool times:
hg status: 4.573 seconds
hg incoming: 12.150 seconds
The command-line times seem to jive with the workbench times, but the workbench makes the delay much more frustrating, because it is synchronous with the use of the program. For example, a typical task is "get the latest stuff my coworker just pushed". It looks like this (only listing the time spent waiting on the computer, rounded):
Open TortoiseHg: 10 minutes.
Open the appropriate repository by double-clicking in the repository registry: 5 seconds.
Commit local changes that need committing:
Click on "Working Directory": 5 seconds.
Select important files and type a commit message.
Press Commit: 20 seconds.
Get coworker's changes:
Check for incoming changesets: 10 seconds.
Review them.
Accept incoming changesets: 40 seconds.
Shelve unready changes:
Open Shelve dialog: 2 seconds.
Shelve remaining files: 6 minutes
Refresh: 5 seconds.
Merge:
Click the other head: 3 seconds.
Merge with local:
Wait for "Clean" verification: 15 seconds.
Wait for merge (assuming no conflicts): 10 seconds.
Commit: 30 seconds.
Unshelve changes:
Open Shelve dialog: 2 seconds.
Unshelve: 6 minutes.
Refresh: 5 seconds.
Total: 24 minutes, 32 seconds.
Twelve of those minutes are spent shelving and unshelving. Ten are spent just opening. One consequence of this is that people tend to commit stuff they aren't sure will go anywhere just in order to avoid the shelving cost. But even if you assume no shelving and no opening cost (maybe you just leave it open), it still takes 2 and half minutes of meticulous clicking to get the latest stuff.
And that doesn't even count the more significant stuff like cloning and whatnot. Everything is this slow.
I have:
Disabled antivirus.
Disabled indexing.
Rebooted.
Tried it on 3 different versions of windows.
Tried in on varying hardware, most of it reasonable quality: Core 2 Duo #3.16 GHz, 8Gb Ram.
Tried it on 32 and 64 bit OSs.
Tried it disconnected from a network.
The repository is actually two repositories: a primary repo and a sub-repo that contains all our third-party binaries. The .hg folder of the primary repo is 676 MB. The .hg folder of the sub-repo is 641 MB. The contents of default in the primary repo is 7.05 GB. The contents of default in the sub-repo is 642 MB. The average file size in the main repo is 563 KB. The max file size in the main repo is 170 MB. There are 13,438 files in the main repo. The average file size in the sub-repo is 23KB. The max file size in the sub-repo is 132 MB. There are 57087 files in the sub-repo.
I have big-push, caseguard, fetch, gestalt, kbfiles, kiln, kilnauth, kilnpath, mq, purge, and transplant extensions enabled.
Any ideas where to start figuring out how to speed stuff up? The slowness is driving us crazy.

Ok, answering my own question because I found the answer while following Tim's advice.
The culprit is kbfiles from FogCreek. Disabling that dropped stat times from 12 seconds to .7 seconds. Likewise, the GUI opens faster than I can time. Re-enabling it causes everything to slow down drastically again.
It doesn't look like every slow thing can be blamed on kbfiles, but the worst of it can. (Specifically, shelve is still pretty slow -- CPU bound. We can work around that, though.)

That is a ton of files... and some are awfully big. How does it perform without the larger files? Binary files aren't exactly the best thing to track with hg/git, in my humble opinion.
What about breaking the big repo up into smaller ones. Do they really need to be in 2 HUGE repos?
Maybe a defrag on the hard drives could slightly improve some of those times. Also look at the extensions that have been created to help deal specifically with big binary files. See here:
https://www.mercurial-scm.org/wiki/HandlingLargeFiles

In some cases the advice given in the documentation may be useful for improving THG speed:
5.4.8. Performance Implications
There are some Workbench features that could have performance implications in large repositories.
View ‣ Choose Log columns…
Enabling the Changes column can be expensive to calculate on repositories with large working copies, causing both refreshes and scrolling to be slow.
View ‣ Load all
Normally, when the user scrolls through the history, chunks of changesets are read as you scroll. This menu choice allows you to have the Workbench read all the changesets from the repository, probably allowing smoother moving through the history.
In my own experience these are definitely worth doing! You should at least try them and see if there is a noticeable effect.
Also, if you have read Why is mercurial's hg rebase so slow? there is a setting which can speed up rebase significantly:
By default, rebase writes to the working copy, but you can configure
it to run in-memory for better performance, and to allow it to run if
the working copy is dirty. Just add following lines in your .hgrc
file:
[rebase]
experimental.inmemory = True

Related

VB6 compiled is slow when copying files

I know, VB6 is historic...ok, but...
I w
rote years ago a backup program not being satisfied from coomercial producuts I tested.
Now I wanted to renew it with some enhancements and a new graphic; the result is quite good for me. Since the file copying process is generally rather slow, I thought to compile it to squeeze some seconds...and instead...this is much slower.
Here are some info:
Win10-64 (version 22H2 just upgraded)
Tested on the same PC with identical parameters
VB6 runs with admin privileges, in Win7 SP3 compatibility mode.
Even if it is not relevant here, the job was to copy a folder containing other 426 folders and 4598 files of different sizes (from 1kB to 435MB, for a total of 1.05GB), from an inside SSD disk to an external SSD disk.
The interpreted version took 7.2 sec while the compiled version ended in 18.6 sec !
I tried different compilation setting in native code, dismissing all the advanced controls over ranges, integers and floats, without any notable difference.
I could accept a small difference for some unknown reason, but it is unreal to get a 2.5:1 ratio.
Any idea?
EDIT
Based on comments:
I repeated the comparison several times; the differences (in both the compiled and the interpreted mode) is around +/- 1sec.
Files are copied using filesystemobject.copyfile
my admin privileges are the same for both
Again, I'm not complaining nor worried by the absolute time the copy takes, I can survive with that since it is an operation made every week and during easy hours.
What is surprising is WHY it happens.
Even the idea to compile the program was due to my curiosity since there is very little to optimize in the code; it is just a for-next loop with very little calculations and assignements.
The program takes the dir and files info from a text-based DB created by recursively scanning of the source folder, then loaded into a custom array...pretty simple.
This is done before the actual copy phase, which is what I'm investigating.

Very slow copy of MyISAM .MYD file

We noticed that a few of our MyISAM .MYD files (MySQL databasebase tables) copy extremely slow. Both the C: drive and the D: drive are SSDs; theoretical limit is 500MB / sec data rate. For timings, we turn off the MySQL service. Here are some sample timing for the file test.myd with 6GB:
NET STOP MYSQL56
Step1: COPY D:\MySQL_Data\test.myd C:\Temp --> 61MB / sec copy speed
Step2: COPY C:\Temp\test.myd D:\temp --> 463 MB / sec
Step3: COPY D:\Temp\test.myd c:\temp\test1.myd --> 92 MB / sec
Strange results; why would the speed in one direction be so different from the other direction?
Let's try this:
NET START MYSQL56
in MySQL: REPAIR TABLE test; (took about 6 minutes)
NET STOP MYSQL56
Step4: COPY D:\MySQL_Data\test.myd C:\Temp --> 463 MB / sec
Step5: COPY C:\Temp\test.myd D:\temp --> 463 MB / sec
Step6: COPY D:\Temp\test.myd c:\temp\test1.myd --> 451 MB / sec
Can anybody explain the difference in copy speed?
What might have caused the slow copy speed in the first place?
What would REPAIR make a difference, but OPTIMIZE which we tried
first, did not make a difference.
Would there be any kind of performance hit on the SQL level with the
initial version (ie before the REPAIR)? Sorry, I did not test this
out before running these tests.
REPAIR would scan through the table and fix issues that it finds. This means that the table is completely read.
OPTIMIZE copies the entire table over, then RENAMEs it back to the old name. The result is as if the entire table were read.
COPY reads one file and writes to the other file. If the target file does not exist, it must create it; this is a slow process on Windows.
When reading a file, the data is fetched from disk (SSD, in your case) and cached in RAM. A second reading will use the cached copy, thereby being faster.
This last bullet item may explain the discrepancies you found.
Another possibility is "wear leveling" and/or "erase-before-write" -- two properties of SSDs.
Wear leveling is when the SSD moves things around to avoid too much "wear". Note that a SSD block "wears out" after N writes to it. By moving blocks around, this physical deficiency is avoided. (It is a feature of Enterprise-grade SSDs, but may be missing on cheap drives.)
Before a write can occur on an SSD, the spot must first be "erased". This extra step is simply a physical requirement of how SSDs work. I doubt if it factors into your question, but it might.
I am removing [mysql] and [myisam] tags since the question really only applies to file COPY with Windows and SSD.

Bamboo build-dir excessive space can it be cleaned up with a cron job?

We use Bamboo CI. There are multiple bamboo local agents and parallel builds across many plans. The build-dir in bamboo-home is many hundreds of gigabytes, and analysis shows that it just continually grows as new feature branches are added. Plans seem to be duplicated in each local agent directory, and also directly in build-dir.
Unlike expiring artifacts, Bamboo does not seem to clean this up by itself. For example, if a local agent is removed then the local agents build directory sits there forever taking up a significant amount of space.
Plans can be set to clean up at the end of a build, however this impacts problem analysis in the event of needing to do a post-mortem on the build.
Due to the directory running out of space I have just added a daily cron task to periodically remove files and directories that haven't been accessed for more than 21 days. When I first ran this manually I reclaimed 300GB from a 600GB partition. I want to know if others have encountered this same issue, and if it is safe to externally clean the build-dir in the long term. Could it impact bamboo builds? Is there some bamboo option that I have missed that would do this for me?
Searching on the Atlassian site has not been helpful and yields no answers... what are others doing to tame this space hog?
The cron job has been running for a while now without any issues, and it is keeping the space usage under control.
I have reduced the parameter to 15 days.
My crontab looks like this:
# clean up old files from working directory
0 20 * * * find /<path_to>/bamboo-home/xml-data/build-dir/ -depth -not -path *repositories-cache* -atime +15 -delete
# clean up old backups every Sunday
0 21 * * 0 find /<path_to>/bamboo-home/backups -type f -mtime +28 -delete
# remove any old logs from install directory after 15 days
0 22 * * * find /<path_to>/bamboo/logs/ -type f -mtime +15 -delete
# quick and dirty truncate catalina.out to stop it growing too large (or better still use logrotate)
0 23 * * * cat /dev/null > /<path_to>/bamboo/logs/catalina.out
I hope this is useful for others trying to tame bamboo's diskspace usage.
The first job is the important one, the last three are just housekeeping.
N.B. logrotate is not used on catalina.out due to unique circumstances in my companies outsourced linux environment. I would generally recommend logrotate if possible rather than my quick and dirty truncate method - see answer by Jon V.
While the cron idea works well - the thing that I've also done in the past with Bamboo is to "Clean working directory after each build" options. Basically, for any given job, there's a config option that will clean up the appropriate build-dir/<build_plan_job> directory for a given plan/job:
Actions -> Configure Plan -> click the Job -> Miscelaneous Tab -> first checkbox
While that makes sure that future build scratch areas are cleaned up, it does not help for already existing and/or old builds. Given the normal git style workflow where you have lots of branches (and each branch creates a specific job ID for it (like PLAN-JOB_WITH_BRANCH_NUMBER-BUILD_NUMBER or similar) that gets old/large fast. I just did a quick check, and we're now cleaning up the build areas for most builds (the large ones at least), but we have over 100Gig of build cruft from branches that have been merged loooong ago.
Thanks for the cron example, though, that should work OK for the future.
Unrelated: the more I use Bamboo, the more I love/hate it.
EDIT: as a general comment, I'd try really hard to work with an SA to get a logrotate rule set up/implemented for the catalina.out - overwriting with /dev/null seems like a really bad idea, unless you're already slurping them up with something like ELK or Splunk.
My /etc/logrotate.d/bamboo_catalina_out looks like (using your paths):
/<path_to>/bamboo/logs/catalina.out {
create 0660 bamboo bamboo
compress
copytruncate
missingok
rotate 10
size 100M
}
Finally - is there a reason why you have both the third and fourth cron scripts?
You can follow
Login with an Admin account
Go to Administration (the cog icon in the top right corner)
Select Expiry from the left hand side menu
Click Edit and configure (if you haven't already) the Global expiry settings and set up a schedule for executing it
Click Save
If you want to execute it immediately click the Run Now button

Rcpp in Rstudio, can't cache in memory when parallel if I don't open the cpp file in Rstudio

I met a wired problem but I wonder if I'm asking the correct question:
result = parLapply(cl, 1:4,
function(j,rho_list_needed,delta0_needed,
V_iter_s,Sigma_list_needed) {
rhoj = rho_list_needed[[j]]
delta0_in_cpp = delta0_needed
v = as.vector(V_iter_s[,,,j])
sigmaj = Sigma_list_needed[[j]]
sourceCpp('sample_Z.cpp')#first time complie slow,then cashed
return(Sample_Z(rhoj,delta0_in_cpp, v,sigmaj,A,Cmatrix))
},rho_list_needed,delta0_needed,
V_iter[[s]],Sigma_list_needed)
When I was testing my sample_Z.cpp with parallel through parLapply, the single calculation takes around 1 sec. By parallel, my 4 iterations takes around 1.2 secs, which is a big improvement compared to unparalleled version, which is 8 sec.
There's no problem at all when I run my program yesterday. Just now I noticed a bug and revised my program. To give my PC a fresh environment, I restarted my computer. When started to run my program, I only opened the .R file, and run. But it took 9 sec for that parallel, which used to be 1.2 sec. The 9 sec was after warming up my cores, i.e., already sourced the cpp before I time it.
I just don't know where is the bug. Then try to source the cpp file directly in my global merriment, and I found out that there was no caching at all. The second time took the same time as the first one.
But I accidentally opened the sample_Z.cpp in Rstudio, explicitly at the editor. And then, everything works correct now.
I don't know how to search this similar problem on google with what kind of key words and I don't know if opening the cpp file is a must, while I never known before.
Can anyone tell me what's the real issue? Thanks!
After restarting your PC, you probably had extra processes running which would have competed for CPU cores that slowed down your algorithm. The fact you're rebooting suggests to me you're not using Linux... but if you are, watch with top while starting your code, or equivalent for your platform.

HP Fortify. Issues while handling very large fpr reports on fortify server

We have this huge source-code base. We scan it using HP SCA and create a fpr file ( size app 620 MB). Then we upload it to our fortify server using "fortifyclient" command.
After uploading, if i log into the fortify server and go into details of that project, i see that the artifact is in "processing" stage. It remains in processing stage even after few days. There is no way provided on the dashboard using which i can stop /kill/delete it.
Ques 1: Why is it taking so long to process ( We have 1 successfully processed fpr report that took 6 days ). What can we do to make it faster?
Ques 2: If i want to delete a artifact while it in in processing stage, how to do that?
Machine Info:
6 CPUs (Intel(R) Xeon(R) 3.07GHz )
RAM 36 gig
Thanks,
Addition:
We had 1 report that was successfully processed earlier in the month for the same codebase. FPR file for that was of also of similar size (610 MB ) . I can see the issue count for that report. Here it is:
EDIT:
Fortify Version: Fortify Static Code Analyzer 6.02.0014
HP Fortify Software Security Center Version 4.02.0014
Total issues: 157000
Total issues Audited: 0.0%
Critical issues: 4306
High: 151200
Low: 1640
medium: 100
That's a large FPR file, so it will need time to process. SSC is basically unzipping a huge ZIP file (that's what an FPR file is) and then transferring the data into the database. Here are a few things to check:
Check the amount of memory allotted for SSC. You may need to pass up to 16Gb of memory as the Xmx value to handlean FPR that size. Maybe more. The easiest way to tell would be to upload the FPR and then watch the java process that your app server uses. See how long it takes to reach the maximum amount of memory.
Make sure the database is configured for performance. Having the database on a separate server with the data files on another hard drive can significantly speed of processing.
As a last resort, you could also try making the FPR smaller. You can turn off the source rendering so that source code is not bundled with the FPR file. You can do this with this command:
sourceanalyzer -b mybuild -disable-source-bundling
-fvdl-no-snippets -scan -f mySourcelessResults.fpr
As far as deleting an in progress upload, I think you have to let it finish. With some tuning, you should be able to get the processing time down.

Resources