Failed to start build #XXX on agent, disable the agent? - teamcity

I have a set of Java GUI tests on Windows that sometimes fail to cleanup a file or a window due to a lock. The next time any test starts I get a "Failed to start build #XXX on agent" message. The build then fails very quickly on that agent, which frees the agent to run another build and this results in a lot of builds failing with the same message on the same agent.
Is there a way to disable the agent when the problem occurs and maybe send a notification?

Rather than disabling the agent, you could try cleaning the files which get locked. Try using Swabra (Build Files Cleaner)

Related

Why does Teamcity send a keyboard interrupt to kill my build?

I have a build running in TeamCity, with only one build step: launching a BAT file. TeamCity sometimes kills my build with a (double) keyboard interrupt, and I have no idea why. The output at the end of the build is like this:
Running build failed.
Error:
NUnit test failed (7).
Starting BuildFailureTarget: recover
Uninstalling service under test..Terminate batch job (Y/N)?
^C
Process exited with code -1073741510
This build runs some integration tests via NUnit, after installing a Windows service, with a SQL database. If any of the tests fail, the build script (which uses FAKE, F#'s Make) runs some cleanup—uninstalls the service, tears down the database. It's the same cleanup code that runs when the build passes, only the target name is different (recover). It seems that TeamCity only kills the build when some tests have failed. I should note that the message "Uninstalling service under test" is from a subprocess which is running the uninstaller. This still happens even if we turn off several failure conditions such that the build (spuriously) passes after several tests fail (we are not using Java, so we assume that one is irrelevant):
I can't figure out why TeamCity is killing my build before it is done. How do I figure out what would cause TeamCity to issue this interrupt?
It seems that TeamCity does this if it detects dangling processes (not sure how to be more precise about that). What we had happening was that an exception was being thrown by a third-party library while we were running a subprocess, before the code that stopped that process. The exception was handled, and the cleanup that got triggered by the exception would have resulted in the process getting shut down anyway (through another means), but before that cleanup was finished, TeamCity was killing our build: which ironically meant that the process never did get shut down.
Our solution was to catch the exception and ensure the first shutdown code got called before failing. Ultimately we were not able from a TeamCity side to get more clarity on what was happening: we found the bug by careful analysis of our code. However it seems that this happens when standard cleanup logic for subprocesses fails.

TeamCity JMeter plugin not showing remote monitoring

I've followed this guide to have my TeamCity build running some JMeter tests, but I'm not seeing the "RemotePerfMon" tab for the server statistics. I have the "Performance Statistics" tab, and I can see that the statistics are definitely being collected, as there is a monitoring.csv file being created and populated in the build agent's work directory.
Any ideas on how I can get the tab to display?
I'm using TeamCity v9.1.6 with JMeter plugin version 83, everything running on Windows 8.
Additional Info:
I've found that there is an open issue on Github for this problem, so I'm obviously not the only one facing this issue.
Make sure TeamCity is NOT running as Administrator
After quite a while playing around with it, I discovered that the problem was that both the TeamCity Server and the TeamCity build agent were running on the same machine, but the Build agent was running as Administrator. Stopping both the services and restarting them as a regular user fixed the issue.
I believe the root of the issue was that the monitoring.csv file was created by the Build Agent as Administrator, then wen the non-admin server agent attempted to parse it, it failed. This error doesn't seem to get logged anywhere, and TeamCity responds to the error by simply not displaying the tab.

TFS Problems communicating between Build Controller and Build Agent on same machine

We're working on setting up a TFS server for our work, and I'm in charge of getting the build working. I have had no experience with TFS before, but setting the build controller and agents up using the wizards was easy enough. We have the TFS server on one machine, and a build controller and build agent on another machine registered to the TFS server.
When I start a build from my developer machine, the build reports as having started and the status of the controller changes to something like "running build vstfs://Build/Build/16". However, the status of the Agent never changes from "Ready" and the build hangs indefinitely. If I stop the build from my developer machine, it reports that the "build was forcefully stopped by the server because the build machine did not respond to a stop request", and the build controller still has the status of "running build". I need to restart the build controller in order to reset the status.
I've checked that port 9191 is unblocked, and I can telnet into the port from my developer machine. The server also seems to be able to communicate with the build machine, as the controller is receiving build requests, but I have no idea what to do from here. Any TFS experts have any idea what might be happening?
Thanks,
Zach
Found the problem.
Under the build service properties, We had the value "Listen for build agent communication on:" set to [BUILDAGENT.companydomain.com:9191/Build/v5.0/Services. We needed the value to be just [BUILDAGENT:9191/Build/v5.0/Services]

How to stop a build on a build agent with an HTTP command

I need to stop a build on an agent with a script, then disable it.
I found in the forums a way to disable/enable agents :
http://teamcity:8080/httpAuth/ajax.html?reason=&_should_restore_status=&status_restoring_delay=15&changeAgentStatus=<AGENT_ID>&enable=false&_=
http://teamcity:8080/httpAuth/ajax.html?reason=&_should_restore_status=&status_restoring_delay=15&changeAgentStatus=<AGENT_ID>&enable=true&_=
Is there an http command to stop the running build on a particular agent?
There appears to be an undocumented feature for cancelling a build. Support for this feature is by no means guaranteed. :)

Run failed tests from TFS post build event

We are using TFS/VS 2010 to run Selenium tests which are scheduled in the TFS controller. After the build and tests are finished I would like to run the failed tests from that build.
Currently I am doing this by using a Windows Scheduled Task and executing a batch file which calls a powershell script which gets the latest build version (and failed tests) and then executes them (using mstest) and finally publishes the results back to build.
I just want this to happen without a windows scheduled task, it is too fickle. I believe I need to edit ProcessTemplate.xaml and add an event (InvokeProcess) to achieve this, I just can't find much on it.
Thanks in advance!

Resources