Teamcity thinks agent is running a build, but it is not - teamcity

Using teamcity 5.1 (build 13360)
Usually everything works as expected
Occasionally it gets into a state where the server runs a build and it thinks the agent is running it, but the agent doesn't actually run the build
I see no sign of the build being run by the agent in its logs
So the build just runs forever on the server
I have restarted agents and the server

So it appears it recovers after a few hours
I just left a build running and it completed after a while
Don't know if restarting server / agents helped or not

Related

Jenkins is going to shut down when server is down

I have installed Jenkins as standalone in a server. But due to security policy of the company, server(wherever Jenkins is running) will be shutting down or restarting automatically after 24 hours. So Jenkins instance which is running in command line inside the server is also closing.
So Jenkins is becoming down
Creating a task scheduler to run Jenkins through command line is not helping since i'm running test automation scripts where the chrome browser is not opening if i schedule a task.
Jenkins is executing internally, Console is running for Web services. What it is executing on Web browser is not showing, as it didn't launch browser.

VSTS change agent state from offline to online after installing it

i'm facing a problem with VSTS agent state is offline i installed the agent through cmd under the right pool and downloaded after that but it still offline any help please ?
If the build agent is running as interactive mode, you need to start agent by running run.cmd file:
Open Command line as administrator
Run run.cmd file (under agent folder)
If the build agent is running as service, you can check whether the related service is running in Services.

Using a Windows VM from Jenkins through vsphere

I'm trying to reset-and-launch a Windows VM (in vsphere) during a Jenkins job. I successfully installed the vSphere Cloud Plugin. I've followed instructions to setup the Windows machine as a jenkins-mvn-slave, and have it setup to run as a service.
If I click on the button in Jenkins for Launch Slave Agent, I can see (in vsphere) that the VM does a revert snapshot, and then it does a power on virtual machine. If I attach to the machine, I can see that the Jenkins service starts automatically. However, back in Jenkins, it tells me that the Slave did not come online in allowed time.
Some key settings for my slave:
Force VM launch: Checked
Wait for VMTools: Not checked
Delay between launch and boot complete: 120
Secondary launch method: Launch slave agents view Java Web Start
Versions:
Jenkins: 1.596.2
vSphere: 5.5.0
Windows: Server 2012 R2 Standard, Build 9600
vSphere plugin: 2.7
What am I missing?
I've done a lot of messing around since I posted, but I think the following is what I was doing wrong. I first got the VM working as a normal slave agent. Once I had that working, then I tried to setup the same as a vsphere-cloud-slave-agent. I wasn't realizing that setting up a host as a slave agent is "agent-name specific".
So, I uninstalled the Jenkins service, launched the "vsphere cloud slave agent", logged into the machine, and ran javaws (as specified in the previously mentioned instructions.
A couple of other gotchas that I encountered (not relevant to the initial post, but maybe relevant to someone who reads this):
I originally installed git with a password manager. Unfortunately, since jenkins jobs aren't interactive, it was hanging on the git clone command. I tried uninstalling and re-installing git, but it didn't fix the problem for whatever user the jenkins slave was running as. I ended up having to revert to a previous slave image and install git from there. (I probably could have also figured out what user was running the jenkins slave, and entered the desired password there.)
I wanted to run a clean VM for each job. I never figured out this one. If I set Availability to Take this slave on-line when in demand and off-line when idle, that was a good start. However, if I set the times to 0 and 0, then the machine was constantly rebooting. If I set the times to 1 and 1, then the machine does mostly what I want, unless there are back-to-back jobs queued to run.

Jenkins doesn't recognize slave being down and thus does not allow for it to reconnect

We have a Jenkins instance running on Ubuntu that has several slaves in different systems. One of them is a Windows 7 host, having jenkins slave instance configured as a service.
We have a problem that when that machine is rebooted, master Jenkins doesn't realize it's gone. It looks to be just fine in the nodes view. Then, when a build is issued that is supposed to use that slave it gets stuck. If that is stopped, the next build fails immediately
Caused by: java.util.concurrent.TimeoutException: Ping started at 1457016721684 hasn't completed by 1457016961684
... 2 more
[EnvInject] - [ERROR] - SEVERE ERROR occurs: channel is already closed
When the slave has started up and it tries to connect back to master, connection is refused, and in the logs there is an error saying connection with that name already exists:
Server didn't accept the handshake: xxx is already connected to this master. Rejecting this connection.
There is issue JENKINS-5055 which claims a fix was committed allowing the same JNLP slave to reconnect without getting rejected, apparently this commit, and according to changelog, it was introduced in version 1.396 (2011/02/02). We are however using version 1.639 and seeing this. Somebody else seems to be seeing it as well. By looking at current codebase, I see where the error is coming from, but don't see the fix done in Jenkins-5055.
Any ideas on resolving this?
Edit: also asked on jenkins user mailing list, but no responses.
We faced the same issue. Used https://wiki.jenkins-ci.org/display/JENKINS/slave-status as workaround
Reinstalling the slave on a Windows Server 2012 R2 machine shows no signs of this behavior, so it seems that either there was a mistake done during installation steps or this is something caused by using a workstation Windows version.
Regardless, here were the steps to get it working, assuming a brand new installation of Windows, with no network connectivity, and master instance using a self-signed certificate:
Install JRE on the machine. If you have 64-bit operating system, install both 32-bit and 64-bit, otherwise go with 32-bit. Download link here
Install .NET 3.5 on the machine. This is needed by the Jenkins service. You can follow the steps outlined by my other answer for this.
Install Jenkins using Windows installer (.zipped) to C:\Jenkins. It can be downloaded from here.
Check your installation is responding by navigating to http://localhost:8080 . In case of trouble, check for logs in the jenkins folder. If there is a port conflict, edit jenkins.xml and change the httpPort to something else.
From the Windows computer, navigate to your master jenkins and configure a new node there.
Start a slave agent using Java Launch Agent in configure -> node screen (you need to be still using your Windows slave computer)
You should see a visible window opening. From there, select File -> Install as a service. (details and screenshots) If you experience an error without proper explanation, confirm .NET 3.5 is properly installed. If you see "WMI.WmiException: AccessDenied", save the jnlp file locally and start it from administrator prompt or otherwise with elevated privileges (details).
From the Administrative tools -> Services, stop and disable the Jenkins service, and stop Jenkins Slave Agent but leave it on Automatic so it will start up when starting up the computer.
This is only relevant if you're using a self-signed or otherwise problematic certificate:
download the previously mentioned Java Launch Agent file (.jnlp file) again and save it to C:\jenkins
open c:\jenkins\jenkins-slave.xml to your editor
change it to refer to your local .jnlp file by changing jnlp url parameter (file:/C:/jenkins/jenkins-slave.jnlp)
add -noCertificateCheck to parameters
replace the -secret parameter with -auth "user:pass", since otherwise automatic url get parameters will be added which will mess finding the .jnlp file
Start the Jenkins Slave Agent service again
For problems with jenkins slave service, check out jenkins-slave.err.log. For Windows Server 2012 R2, you can get the functionality of tail by using Get-Content .\jenkins-slave.err.log -Wait -Tail 10 in Powershell prompt. For older versions of Powershell, leave out -Tail 10.

Build failed after TeamCity agent shutdown

We have TeamCity with several agents and some agents run on employee's/programmers computer. If the employee shutdown computer in the evening, the running build is mark as failed and email notification is sent. Is is possible in the case of shutdown, to mark the build as cancelled and choose different agent and run the build again on different agent?
I guess if the computer is shutdown while TeamCity build is running, it [build] will be marked as failed anyway. So the question is how to rerun build after failure on new machine.
You can create agent pools and assign several build agents to the pool. After that you should edit 'Agent requirements' setting in build configuration, so your build can be run on any alive machine.
You can use 'Retry build trigger' in trigger sections to rerun your build after failure. It will rerun all failed builds, but it is OK for your problem
Build retry trigger (https://confluence.jetbrains.com/display/TCD8/Configuring+Build+Triggers) + TC REST API (https://confluence.jetbrains.com/display/TCD8/REST+API#RESTAPI-BuildRequests) to delete failed builds.

Resources