Jenkins doesn't recognize slave being down and thus does not allow for it to reconnect - windows-7

We have a Jenkins instance running on Ubuntu that has several slaves in different systems. One of them is a Windows 7 host, having jenkins slave instance configured as a service.
We have a problem that when that machine is rebooted, master Jenkins doesn't realize it's gone. It looks to be just fine in the nodes view. Then, when a build is issued that is supposed to use that slave it gets stuck. If that is stopped, the next build fails immediately
Caused by: java.util.concurrent.TimeoutException: Ping started at 1457016721684 hasn't completed by 1457016961684
... 2 more
[EnvInject] - [ERROR] - SEVERE ERROR occurs: channel is already closed
When the slave has started up and it tries to connect back to master, connection is refused, and in the logs there is an error saying connection with that name already exists:
Server didn't accept the handshake: xxx is already connected to this master. Rejecting this connection.
There is issue JENKINS-5055 which claims a fix was committed allowing the same JNLP slave to reconnect without getting rejected, apparently this commit, and according to changelog, it was introduced in version 1.396 (2011/02/02). We are however using version 1.639 and seeing this. Somebody else seems to be seeing it as well. By looking at current codebase, I see where the error is coming from, but don't see the fix done in Jenkins-5055.
Any ideas on resolving this?
Edit: also asked on jenkins user mailing list, but no responses.

We faced the same issue. Used https://wiki.jenkins-ci.org/display/JENKINS/slave-status as workaround

Reinstalling the slave on a Windows Server 2012 R2 machine shows no signs of this behavior, so it seems that either there was a mistake done during installation steps or this is something caused by using a workstation Windows version.
Regardless, here were the steps to get it working, assuming a brand new installation of Windows, with no network connectivity, and master instance using a self-signed certificate:
Install JRE on the machine. If you have 64-bit operating system, install both 32-bit and 64-bit, otherwise go with 32-bit. Download link here
Install .NET 3.5 on the machine. This is needed by the Jenkins service. You can follow the steps outlined by my other answer for this.
Install Jenkins using Windows installer (.zipped) to C:\Jenkins. It can be downloaded from here.
Check your installation is responding by navigating to http://localhost:8080 . In case of trouble, check for logs in the jenkins folder. If there is a port conflict, edit jenkins.xml and change the httpPort to something else.
From the Windows computer, navigate to your master jenkins and configure a new node there.
Start a slave agent using Java Launch Agent in configure -> node screen (you need to be still using your Windows slave computer)
You should see a visible window opening. From there, select File -> Install as a service. (details and screenshots) If you experience an error without proper explanation, confirm .NET 3.5 is properly installed. If you see "WMI.WmiException: AccessDenied", save the jnlp file locally and start it from administrator prompt or otherwise with elevated privileges (details).
From the Administrative tools -> Services, stop and disable the Jenkins service, and stop Jenkins Slave Agent but leave it on Automatic so it will start up when starting up the computer.
This is only relevant if you're using a self-signed or otherwise problematic certificate:
download the previously mentioned Java Launch Agent file (.jnlp file) again and save it to C:\jenkins
open c:\jenkins\jenkins-slave.xml to your editor
change it to refer to your local .jnlp file by changing jnlp url parameter (file:/C:/jenkins/jenkins-slave.jnlp)
add -noCertificateCheck to parameters
replace the -secret parameter with -auth "user:pass", since otherwise automatic url get parameters will be added which will mess finding the .jnlp file
Start the Jenkins Slave Agent service again
For problems with jenkins slave service, check out jenkins-slave.err.log. For Windows Server 2012 R2, you can get the functionality of tail by using Get-Content .\jenkins-slave.err.log -Wait -Tail 10 in Powershell prompt. For older versions of Powershell, leave out -Tail 10.

Related

Using a Windows VM from Jenkins through vsphere

I'm trying to reset-and-launch a Windows VM (in vsphere) during a Jenkins job. I successfully installed the vSphere Cloud Plugin. I've followed instructions to setup the Windows machine as a jenkins-mvn-slave, and have it setup to run as a service.
If I click on the button in Jenkins for Launch Slave Agent, I can see (in vsphere) that the VM does a revert snapshot, and then it does a power on virtual machine. If I attach to the machine, I can see that the Jenkins service starts automatically. However, back in Jenkins, it tells me that the Slave did not come online in allowed time.
Some key settings for my slave:
Force VM launch: Checked
Wait for VMTools: Not checked
Delay between launch and boot complete: 120
Secondary launch method: Launch slave agents view Java Web Start
Versions:
Jenkins: 1.596.2
vSphere: 5.5.0
Windows: Server 2012 R2 Standard, Build 9600
vSphere plugin: 2.7
What am I missing?
I've done a lot of messing around since I posted, but I think the following is what I was doing wrong. I first got the VM working as a normal slave agent. Once I had that working, then I tried to setup the same as a vsphere-cloud-slave-agent. I wasn't realizing that setting up a host as a slave agent is "agent-name specific".
So, I uninstalled the Jenkins service, launched the "vsphere cloud slave agent", logged into the machine, and ran javaws (as specified in the previously mentioned instructions.
A couple of other gotchas that I encountered (not relevant to the initial post, but maybe relevant to someone who reads this):
I originally installed git with a password manager. Unfortunately, since jenkins jobs aren't interactive, it was hanging on the git clone command. I tried uninstalling and re-installing git, but it didn't fix the problem for whatever user the jenkins slave was running as. I ended up having to revert to a previous slave image and install git from there. (I probably could have also figured out what user was running the jenkins slave, and entered the desired password there.)
I wanted to run a clean VM for each job. I never figured out this one. If I set Availability to Take this slave on-line when in demand and off-line when idle, that was a good start. However, if I set the times to 0 and 0, then the machine was constantly rebooting. If I set the times to 1 and 1, then the machine does mostly what I want, unless there are back-to-back jobs queued to run.

Bamboo remote build agent cannot find powershell.exe after installing nodejs

I just installed nodejs on one of my build servers (Win Server 2008 R2) which hosts a Bamboo remote agent. After completing the installation and doing a reboot I got stuck in the following situation:
The remote Bamboo build agent is running as a windows service with user MyDomain\MyUser. When a build with an inline powershell task is executing it fails with the error (from the build agent log):
com.atlassian.utils.process.ProcessNotStartedException: powershell could not be started
...
java.io.IOException: Cannot run program "powershell"
...
java.io.IOException: CreateProcess error=2, The system cannot find the file specified
Loggin on to the server as MyDomain\MyUser, I have checked that powershell is in the path:
where powershell
C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe
I have tried to restart the service and reboot the machine multiple times. No luck. The only thing that works is if I execute my scripts as a bat file with an absolute path to powershell - but I do not want that.
I have searched for solutions on this, but even though this one seems related: Hudson cannot find powershell after update to powershell 3 - the proposed solutions do not work.
What am I missing here?
If you do a default installation of nodejs you will see that it adds nodejs and npm to the path. Sometimes I have seen that the installer adds a user variable named PATH - it might be that the Bamboo agent decides to read the user path without "merging" it with the system path. I think it would be worth a try to give that a look.
As per Atlassian support page, this is related to a bug in Java Service Wrapper. I tried Workaround-2 since there was no user PATH variable in my system. I had to uninstall bamboo agent service and Java 64 versions from the agent machine to apply the workaround-2.

RabbitMQ fails on Error: unable to connect to node rabbit#TPAJ05421843: nodedown

On a Windows 7 Enterprise machine, I made a fresh install of Erlang 17.4 and RabbitMQ 3.4.3 x64. The installation was successful and uneventful.
I have not yet tried to create my first queue or exchange, but I already see trouble. This problem is similar to another SO post, but that other post appears to involve clustering, which I don't have. Furthermore, that other poster can circumvent his issue by restarting the RabbitMQ service; that approach does not work for me.
My "nodedown" problem is evident at the RabbitMQ command prompt:
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.4.3\sbin>rabbitmqctl status
Status of node rabbit#TPAJ05421843 ...
Error: unable to connect to node rabbit#TPAJ05421843: nodedown
DIAGNOSTICS
attempted to contact: [rabbit#TPAJ05421843]
rabbit#TPAJ05421843:
* connected to epmd (port 4369) on TPAJ05421843
* epmd reports: node 'rabbit' not running at all
other nodes on TPAJ05421843: ['RabbitMQ']
* suggestion: start the node
current node details:
- node name: 'rabbitmqctl-19884#TPAJ05421843'
- home dir: H:\
- cookie hash: PD4QQCYrf0TME9vIko3Xuw==
Based on the above, I chose to check the status of the node explicitly named 'RabbitMQ'. I get this:
C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.4.3\sbin>rabbitmqctl -n RabbitMQ status
Status of node 'RabbitMQ#TPAJ05421843' ...
Error: unable to connect to node 'RabbitMQ#TPAJ05421843': nodedown
DIAGNOSTICS
attempted to contact: ['RabbitMQ#TPAJ05421843']
RabbitMQ#TPAJ05421843:
* connected to epmd (port 4369) on TPAJ05421843
* epmd reports node 'RabbitMQ' running on port 59301
* TCP connection succeeded but Erlang distribution failed
* suggestion: hostname mismatch?
* suggestion: is the cookie set correctly?
current node details:
- node name: 'rabbitmqctl-23076#TPAJ05421843'
- home dir: H:\
- cookie hash: PD4QQCYrf0TME9vIko3Xuw==
Ok, this is barely better since at least it acknowledges 'RabbitMQ' running on port 59301. But what the heck could it mean that "Erlang distribution failed"?
When I try to research this topic, I found articles saying "be sure you have matched cookies." Based on that I found this article, which claims the "cookie mismatch" does not pertain to me, because I have not created (nor intend to create) a RabbitMQ cluster.
What should I do?
I had this same problem today. There were no cookie or firewall problems and windows reported that the service was running successfully. This is what finally fixed it:
Run RabbitMQ sbin command prompt as administrator.
Run "rabbitmq-service remove"
Run "rabbitmq-service install"
For some reason the service set up by the installer did not configure several registry entries. Running this set them correctly and allowed the service to run.
One thing I noticed was that before I did this, there was no description of the service in the Windows Services view. After installing with the rabbitmq-service command, the description was visible. This might be a quick indicator if you are having the same problem.
As #eddyP commented, I had two different Erlang cookie files:
A server cookie file, located at $env:WINDIR\system32\config\systemprofile\.erlang.cookie (prior to Erlang 20.2 it was located at $env:WINDIR\.erlang.cookie).
A client cookie file, located at $env:USERPROFILE\.erlang.cookie.
Copying the server cookie file over the client one, so that both files were the same, fixed the problem for me.
For further details, see "How Nodes (and CLI tools) Authenticate to Each Other: the Erlang Cookie".
From RabbitMQ Command Prompt sbin (run as administrator) execute this command:
rabbitmq-server restart
In Windown, For some reason delete all folder in c:\Users\xxx\AppData\Roaming\RabbitMQ\db\ (xxx is your username)
then flow #Jerdev answer and
start rabbitmq net start rabbitmq
check rabbitmq service rabbitmqctl status
The same question on the RabbitMQ mailing list: https://groups.google.com/forum/#!topic/rabbitmq-users/0s1ExFhl4hM.
The Erlang cookie is used by rabbitmqctl as well as server nodes, so it may need being taken care of (placed in the correct location).
See "Installing as a non-administrator user leaves .erlang.cookie in the wrong place" on Windows quirks.
I resolve my problem doing this in Windows 10.
Execute RabbitMQ Command Prompt (sbin dir) as administrator.
Execute "rabbitmq-service remove" in (RabbitMQ Command Prompt).
Execute %AppData% in Run Dialog Box of Windows.
Delete all files in RabbitMQ folder.
Execute "rabbitmq-service install" in (RabbitMQ Command Prompt).
Execute "rabbitmqctl start_app" in (RabbitMQ Command Prompt).
If you come here looking for a linux answer for the same error message, try
sudo service rabbitmq-server start
(which is not a blocking command)
Just do the following:
Uninstall rabbitmq and erlang.
delete the rabbitmq folder existing in your appdata (if you dont
know the appdata location, just type echo %AppData% in the command
prompt)
Then install erlang first and then rabbitmq.
After installing, enable the management plugin using below command:
rabbitmq-plugins enable rabbitmq_management
For me the cookies didnt match, like the other comments but the locations was in a different path for those having the same issue as me C:\Windows\System32\config\systemprofile
That is happening because rabbit MQ is not being installed correctly on Windows (and this error is misleading!). So to solve it do the following:
type "cmd" in Cortana search or in "Run" for older version of Windows
right click on in and choose "Run as Administrator"
go to rabbit's sbin folder (cd "C:\Program Files\RabbitMQ Server\rabbitmq_server-3.7.4\sbin")
run: rabbitmq-service remove
run: rabbitmq-service install
now you can run
6. rabbitmq-plugins enable rabbitmq_management
7. rabbitmq-service start
8. and, finally, run: start http://localhost:15672
9. log on as user "guest" with password: "guest" and that's it. Happy Rabbiting!
I missed restarting my WINDOWS OS and then deleting the old version of ERLANG (which I uninstalled before restarting).
Somehow the fresh installation of Rabbit was referring to the old (un-installed version) and all the mismatch was happening. Clue was the 'services' referred Rabbit from the old ERLANG version.
This is how I resolved the error in my Windows 8 system:
Check for a syntax error in the rabbitmq.config file placed in the AppData folder for Windows.
How to check if there is any syntax error?
You can run rabbitmq-server restart from sbin folder in:
Program Files/RabbitMQ/rabbitmq_server_x.x/sbin/.
Replace the content of the rabbitmq.config with rabbitmq.config.example.
You may find the rabbitmq.config.example in:
Program Files/RabbitMQ/rabbitmq_server_x.x/etc/
Warning, you will lose the configuration you have saved previously with rabbitmq.
After changing the files, just hit
rabbitmq-server restart
in the sbin folder mentioned above.

SonarQube installation failing to start service

I'm installing sonarqube on Windows Server 2012.
I have followed the following steps:
Downloaded sonarqube4.4 and extracted to C:\Sonarqube
Downloaded Java JDK 1.7.0_60 and jre 1.7.0_67 as well as jre7
Installed Windows SDK 7 and .NET Framework 4
Navigated to C:\sonar\bin\windows x86-64 and ran StartSonar.bat as an administrator, this ran ok with no output and Ihad to hot ctrl- Z to break
I then ran \windows-x86-64\InstallNTService.bat as an administrator and I am seeing the sonarQube services was launched, but failed to start.
Not sure what the problem is.
I believe you first ran \windows-x86-64\InstallNTService.bat successfully and then StartSonar.bat unsuccessfully (the inverse order of what you describe).
You probably have [this problem]: http://qualilogy.com/fr/wp-content/uploads/sites/2/2013/09/Sonar_ServiceLaunchError2.jpg
Windows could not start the Sonar service on Local Computer.
Error 1067: The process terminated unexpectedly.
In that case, the solution is to change the user/rights to launch the Sonar service: https://qualilogy.com/en/migrate-sonarqube-tomcat-to-windows-service/
Go to the Services window, find the Sonar service, and open the Properties windows to change the user it logs on as to one with sufficient permissions.
I was able to solve this problem by creating a new folder named “Temp” in C:\Windows\System32\config\systemprofile\AppData\Local\
The Log-File will show only
--> Wrapper Started as Service
Cleaning or creating >temp directory C:\Program Files (x86)\SonarQube\sonarqube\temp
<-- Wrapper Stopped
The SonarQube service was launched, but failed to start.
After a long search, I came up to this site http://zen-and-art-of-programming.blogspot.de/2013/03/installing-and-running-sonar-source.html.
Solution:
Navigate to C:\Windows\system32\config\systemprofile\AppData\Local\ and create the directory Temp
2: Set the user rights to full access
3: Run the StartNTService.bat

How to run jenkins slave on windows 2012 r2 x64?

We want to use jenkins to build some specific software on Windows 2012 R2 x64.
But when I trying to run it, master node fails whis this error:
Connecting to 192.168.1.27
Checking if Java exists
C:\Program Files\Java\jdk1.6.0_30\bin\java.exe -version returned 1.6.0.
Installing the Jenkins slave service
ERROR: Message not found for errorCode: 0xC00000AC
org.jinterop.dcom.common.JIException: Message not found for errorCode: 0xC00000AC
at org.jinterop.winreg.smb.JIWinRegStub.winreg_OpenHKLM(JIWinRegStub.java:102)
at hudson.util.jna.DotNet.isInstalled(DotNet.java:77)
at hudson.os.windows.ManagedWindowsServiceLauncher.launch(ManagedWindowsServiceLauncher.java:292)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:222)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
at jcifs.smb.SmbTransport.send(SmbTransport.java:664)
at jcifs.smb.SmbSession.send(SmbSession.java:238)
at jcifs.smb.SmbTree.send(SmbTree.java:119)
at jcifs.smb.SmbFile.send(SmbFile.java:775)
at jcifs.smb.SmbFile.open0(SmbFile.java:989)
at jcifs.smb.SmbFile.open(SmbFile.java:1006)
at jcifs.smb.SmbFileOutputStream.<init>(SmbFileOutputStream.java:142)
at jcifs.smb.TransactNamedPipeOutputStream.<init>(TransactNamedPipeOutputStream.java:32)
at jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187)
at rpc.ncacn_np.RpcTransport.attach(RpcTransport.java:91)
at rpc.Stub.attach(Stub.java:104)
at rpc.Stub.call(Stub.java:109)
at org.jinterop.winreg.smb.JIWinRegStub.winreg_OpenHKLM(JIWinRegStub.java:100)
and I don't know what is wrong.
Yes, I've read this carefully.
upd. ok. I removed server from domain. Now jenkins says:
Connecting to 192.168.1.27
Checking if Java exists
C:\Program Files\Java\jdk1.6.0_30\bin\java.exe -version returned 1.6.0.
Installing the Jenkins slave service
Copying jenkins-slave.exe
Copying slave.jar
Copying jenkins-slave.xml
Registering the service
Starting the service
Waiting for the service to become ready
ERROR: The service did not respond. Perhaps it failed to launch?
In EventViewer I see:
Service cannot be started. System.ComponentModel.Win32Exception: The system cannot find the file specified
at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo startInfo)
at winsw.WrapperService.StartProcess(Process process, String arguments, String executable)
at winsw.WrapperService.OnStart(String[] _)
at System.ServiceProcess.ServiceBase.ServiceQueuedMainCallback(Object state)
I had the same issue on on Windows 2012 R2 x64:
Installing the Jenkins slave service
ERROR: Message not found for errorCode: 0xC00000AC
org.jinterop.dcom.common.JIException: Message not found for errorCode: 0xC00000AC
at org.jinterop.winreg.smb.JIWinRegStub.winreg_OpenHKLM(JIWinRegStub.java:102)
at hudson.util.jna.DotNet.isInstalled(DotNet.java:77)
at hudson.os.windows.ManagedWindowsServiceLauncher.launch(ManagedWindowsServiceLauncher.java:292)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:228)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: jcifs.smb.SmbException: All pipe instances are busy.
at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563)
at jcifs.smb.SmbTransport.send(SmbTransport.java:664)
at jcifs.smb.SmbSession.send(SmbSession.java:238)
at jcifs.smb.SmbTree.send(SmbTree.java:119)
at jcifs.smb.SmbFile.send(SmbFile.java:775)
at jcifs.smb.SmbFile.open0(SmbFile.java:989)
at jcifs.smb.SmbFile.open(SmbFile.java:1006)
at jcifs.smb.SmbFileOutputStream.<init>(SmbFileOutputStream.java:142)
at jcifs.smb.TransactNamedPipeOutputStream.<init>(TransactNamedPipeOutputStream.java:32)
at jcifs.smb.SmbNamedPipe.getNamedPipeOutputStream(SmbNamedPipe.java:187)
at rpc.ncacn_np.RpcTransport.attach(RpcTransport.java:91)
at rpc.Stub.attach(Stub.java:104)
at rpc.Stub.call(Stub.java:109)
at org.jinterop.winreg.smb.JIWinRegStub.winreg_OpenHKLM(JIWinRegStub.java:100)
... 7 more
and have found out, that jenkins slave to be run as a service requires to have .net 3.x installed (which is not by default on win2012 servers).
After having the 3.5 .net framework installed jenkins slave service got installed without issues (and the server remained in domain).
To not loose time in case you'd hit problems while installing 3.5 framework as I did, refer to this SO answer: Offline installer for .Net 3.5 SP1 not working (disabling the WSUS helped me to get the installation through)
I had this issue:
Service cannot be started. System.ComponentModel.Win32Exception: The system cannot find the file specified
at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo startInfo)
at winsw.WrapperService.StartProcess(Process process, String arguments, String executable)
at winsw.WrapperService.OnStart(String[] _)
at System.ServiceProcess.ServiceBase.ServiceQueuedMainCallback(Object state)
And the problem went away when i changed Path to java executable from :
C:\ProgramData\Oracle\Java\javapath\javaw
to
C:\ProgramData\Oracle\Java\javapath\java
I think 'w' is appended by JENKINS anyway.
I have a Windows 8 slave, which does work. Fair chance the Java path is incorrect, mine is set to C:\Program Files (x86)\Java\jre7\bin\java (yes, no .exe or anything, adjust just the Program (x86) Files part if you want to use x64 version) in Jenkins. I also have remote root set (to C:\jenkins) and I have an environment variable HOME set to C:\jenkins\ (yes, one is with \ and the other without), but that's only to make it easy to find the files after installation.
My slave works, appears online and then after a while (of idling mostly) will have connection issues. Disconnecting and reconnecting will then sometimes give the 'All pipe instances are busy' error, in that case I just have to do Launch slave agent a couple of times. Found your question when trying to solve that particular issue...
First thing, you can Go to the slave machine, Go to Jenkins -> Manage -> Manage Nodes and select the Slave and launch via java web start.
by doing this you will download a .jnlp file and launch it using java configured by you.
Make sure you have configured path to javaws.exe in system variable "PATH"(with version 1.6 or higher).
It will launch a window and and display as "Connected".
Now you can click on "File" and install as service.
This fails giving you an exception if the machine does not have .NET 3.x so make sure you have installed it and then try it again.
This works pretty cleanly without any issues.
Hope this helps.
For me this issue was resolved by uninstalling old java.
I encountered a similar issue, when trying to run the slave(agent) from master(jenkins), after making one configuration change as below it worked fine,
the below settings needs to added in the agent node->configure.
JVM option -Djsse.enableSNIExtension=false
At Jenkins->Agent node, for a windows server 2012 slave configuration

Resources