Agent is slow and Hung in TeamCity - teamcity

One of the agent is going on running and not stopping , I just executed one line powershell script write-host "Hello World" which is taking 57 seconds to finish. Please refer below screenshots
So it is almost taking 1 minute to 4 minutes for the collecting changes from VCS root in the intial step
PowerShell script is doing here write host hello world which is one line, We are not trying to install any module here
TO troubleshoot the issue, I was trying to do below but stuck in finding the logs option:
To check the logs, I have navigated to Agent--> There I do not see LOG tab to dig further. What needs to be done to get log option over there?
and is below folder in the screenshot, you are referring to <agent_home_dir>/logs folder. If I have multiple agents , how do I figure out particular agent home directory logs folder?
I need to figure out why it is slow and also fix for that. Please suggest on what needs to be done.

Related

View Xcode bot logs while bot is running

I waste a lot of time when running Xcode bots and I just want to see if I have it configured correctly. My test suite takes 5 minutes to run, so having to wait that amount of time each time I tweak a setting until I can see the results is not ideal. Is there any way I can see the logs as the bot is running?
An alternative approach would be some way to run just a single test, if that's possible. Obviously I could remove/comment all other tests, but I'm looking for a faster way.
This is a bit tricky to do, but possible.
Xcode Server stores bot log information in /Library/XcodeServer/IntegrationAssets/<bot_name_here>/.
Within this directory, you will find number folders for each integration (folders named 1/, 2/, 3/, etc), and within each of those folders you will find the following files (not necessarily limited to these but this is what I see):
buildService.log
sourceControl.log
trigger-before-0.log
...etc
However, this directory is only accessible if you are the root user. If you really want to take a look at logs while bots are running, you can assume root on your server machine with the following command (server password required):
sudo su -
then you can navigate to the above directory and observe the log files as they are being written.

Jenkins Build Never Finishing

I have a Jenkins master/slave set up which has been working quite happily, running Oracle imports on some Linux boxes.
I have just added a new slave node and tried to run our existing database import job on this new node. This job consists of three subprojects; the first one runs some execute shells, copying files and changing permissions and this currently completes successfully, the second runs an execute shell which ends with an Oracle impdp. The impdp completes (the db exists and ps -ef no longer shows impdp running) but the Jenkins subproject never finishes. The UI just sits there with the clock whirring around.
I've tried adding an echo after the impdp, and this also executes correctly, but the subproject still never finishes.
If I add a Post-Build email notification, it is not sent.
The third subproject is never reached.
What could be the cause of this and how do I debug what is happening?
In our case, the jobs would declare "Finished: SUCCESS", but then continue with some unknown Jenkins business for another 10 or 20 minutes. After putting on more detailed logging, we found it was related to the ill-named LogRotator.
We have thousands of old builds and are deleting the artifacts for those older than a certain number of days. Because of the way old builds are handled, Jenkins searches the entire list of old builds even though they have already had their artifacts removed.
There is issue that is now fixed related to this: https://issues.jenkins-ci.org/browse/JENKINS-22607
As of right now I do not see it in a release, but if you have this issue, the temporary workaround is to turn off the deletion.
This turned out to be something horrible :-)
After finishing the work, Jenkins tries to kill all processes it spawned. To identify them, it goes through all processes in the OS, reads from /proc/<pid>/environ (this is a Linux box) which contains the process’ environment variables and compares them with the environment it sets to Jenkins processes.
Problem was there was one particular Oracle process running on our db server where if you tried to read from /proc/pid/environ for it, it would just hang forever – which is where the Jenkins code would get stuck.
I have no idea why it was getting stuck like this and nor did our DBA. We restarted it and now it works.
You can add set +x to the top of shell scripts to see which commands are actually executed. That way you should be able to easily see from the output which command is blocking.

How to run a specified bat file at particular time ie.. scheduler

I have been using the at command to schedule the task
ex: at 14:45 my.bat
and i am getting the o/p on the command prompt as
"JOB ID is added"
But this command is not getting fired on the time which i have scheduled..
Can anyone please help me.......
I suspect the issue is not that the BAT file is not executing at all, but rather either or both of i) individual commands within the BAT file are failing, or ii) the output isn't getting sent to the place you're looking for it. (Things get even weirder if anything in the batch file requests input, since a) by default batch files may not be able to interact with the "console" at all and b) the system is probably unattended anyway at the time the batch file executes.) If my suspicion is right, there is no one "fix-everything" but rather a whole bunch of small fixes ...and you have to hit every one of them.
Find out the needed password to actually login as 'admin' (rather than your usual user), open a DOS box, and try to run the batch file. There should be some sort of error message that you can see. Fix that problem. Then try again ...and fix the next problem. Keep correcting problems and trying again until finally everything works.

jenkins started with all jobs lost, trying to use 'copy existing job' feature

the CI server was disconnected for a while for some strange reason from the network and when it came back up, jenkins displayed with no jobs. however in the directory where the jobs live, /var/lib/jenkins/jobs/, the two jobs that should appear are there, but don't show any evidence of existence in the web client.
i tried using the 'copy existing job' and then pointed it to /var/lib/jenkins/jobs/existing_test but it tells me: no such job /var/lib/jenkins/jobs/existing_test
any suggestions as to how to get this to work ?
I know that question can be outdated, but a possible a solution is to run jenkins under appropriate user (the one it run previously). This helped me.
ended up just building the jobs brand new, wasn't able to find a fix
At first I would try and look in the jenkins logs, as your data is in /var/lib/jenkins I would guess your log files are in /var/log/jenkins. Maybe you can find out whats wrong from there.
Also you could try the "load configuration from disk" link in the "manage jenkins" view. That should try to reload the configuration files from your directories, and maybe bring your jobs back. Anyways, you should be able to see something in your logs. If the logs are empty check file permissions, I used to have problems with that after updating sometimes.

Hudson CI Server configuration gone very wrong

I am not sure if this is a true SO question really, so will understand if it gets closed or moved.
I am new to hudson, I have had it up and running for a few weeks now and so far have been very pleased. It is running on a Windows 2008 X64 machine as a windows service.
The WS2008 runs on VMware ESXI4.0, as well as another WS2008 and an Ubuntu Server. Last night the two windows servers suddenly stopped responding, MSTSC, file share, web access, it all stopped. The ESXI server still responded as I could browse to the home page and see its install guide. Also the Ubuntu machine continued to work normally, I ssh'd onto it, the Apache server was running and Samba still responded normally.
In the end I had to reboot the physical box to get it all back up again.
Once I did the servers came back up but Hudson has now lost its settings. What is weird is it still asks me to log in, and the username and password still work, so it knows that user is setup.
The user was setup to be an admin user so I could manage the whole site. I had three build jobs setup and had them building each night.
Now when I log in I do not see half the options on the left hand menu and there are no jobs.
I am not really sure where to start with this to try and solve it.
I could really do with some help and guidance.
Thank you
Jon
EDIT
OK, so I can narrow my question down now.
If I remove the user security it shows me all the builds and I can manage the system again.
<?xml version='1.0' encoding='UTF-8'?>
<hudson>
<version>1.341</version>
<numExecutors>2</numExecutors>
<mode>NORMAL</mode>
<useSecurity>false</useSecurity>
<authorizationStrategy class="hudson.security.GlobalMatrixAuthorizationStrategy">
<permission>hudson.scm.SCM.Tag:Jon</permission>
<permission>hudson.model.View.Configure:Jon</permission>
<permission>hudson.model.Computer.Configure:Jon</permission>
<permission>hudson.model.Item.Configure:Jon</permission>
<permission>hudson.model.Item.Create:Jon</permission>
<permission>hudson.model.Run.Delete:Jon</permission>
<permission>hudson.model.Computer.Delete:Jon</permission>
<permission>hudson.model.View.Delete:Jon</permission>
<permission>hudson.model.Hudson.Read:anonymous</permission>
<permission>hudson.model.Hudson.Read:Jon</permission>
<permission>hudson.model.Run.Update:Jon</permission>
<permission>hudson.model.Hudson.Administer:Jon</permission>
<permission>hudson.model.Item.Build:Jon</permission>
<permission>hudson.model.Item.Read:Jon</permission>
<permission>hudson.model.Item.Delete:Jon</permission>
<permission>hudson.model.Item.Workspace:Jon</permission>
<permission>hudson.model.View.Create:Jon</permission>
</authorizationStrategy>
<securityRealm class="hudson.security.HudsonPrivateSecurityRealm">
<disableSignup>true</disableSignup>
</securityRealm>
This is the line I changed, was true.
<useSecurity>false</useSecurity>
my user name is "Jon", which I can still log on with, but I can only see the following options:
Build History
My Views
Leader Board
If I try and go directly to "/manage" I get access denied.
Second Edit
Fixed it, I removed all security, went in and re added the User then it seemed to remember everything.. very odd.
Third Edit
Didn't fix it, but found out what the original problem is. It is forgetting my user settings, so even if I re add "Jon" back in with all privileges after a restart it forgets it all again.
All of hudson's configuration information lives in XML files in the hudson home directory. You didn't tell us how you deploy it (winstone? jetty?), but you have to have a home directory somewhere.
It's rather hard to imagine hudson deleting these files.
I would make a new, clean, install of hudson somewhere else and compare it to the state of your broken installation.
I once lost data when I edited the Job xmls (every job has it's own config.xml) and broke the xml structure. That will prevent Hudson from loading the jobs. You might find some information, of what is going wrong in the log files (HUDSON_HOME\*.log).

Resources