Jenkins Build Never Finishing - oracle

I have a Jenkins master/slave set up which has been working quite happily, running Oracle imports on some Linux boxes.
I have just added a new slave node and tried to run our existing database import job on this new node. This job consists of three subprojects; the first one runs some execute shells, copying files and changing permissions and this currently completes successfully, the second runs an execute shell which ends with an Oracle impdp. The impdp completes (the db exists and ps -ef no longer shows impdp running) but the Jenkins subproject never finishes. The UI just sits there with the clock whirring around.
I've tried adding an echo after the impdp, and this also executes correctly, but the subproject still never finishes.
If I add a Post-Build email notification, it is not sent.
The third subproject is never reached.
What could be the cause of this and how do I debug what is happening?

In our case, the jobs would declare "Finished: SUCCESS", but then continue with some unknown Jenkins business for another 10 or 20 minutes. After putting on more detailed logging, we found it was related to the ill-named LogRotator.
We have thousands of old builds and are deleting the artifacts for those older than a certain number of days. Because of the way old builds are handled, Jenkins searches the entire list of old builds even though they have already had their artifacts removed.
There is issue that is now fixed related to this: https://issues.jenkins-ci.org/browse/JENKINS-22607
As of right now I do not see it in a release, but if you have this issue, the temporary workaround is to turn off the deletion.

This turned out to be something horrible :-)
After finishing the work, Jenkins tries to kill all processes it spawned. To identify them, it goes through all processes in the OS, reads from /proc/<pid>/environ (this is a Linux box) which contains the process’ environment variables and compares them with the environment it sets to Jenkins processes.
Problem was there was one particular Oracle process running on our db server where if you tried to read from /proc/pid/environ for it, it would just hang forever – which is where the Jenkins code would get stuck.
I have no idea why it was getting stuck like this and nor did our DBA. We restarted it and now it works.

You can add set +x to the top of shell scripts to see which commands are actually executed. That way you should be able to easily see from the output which command is blocking.

Related

Is there a way to prevent a bash script from running certain commands if the script has to be run again?

I have a bash script that works at the moment. It gets an image and JDK 8 from a link and then runs a installer for the JDK 8 to move on to setting up another piece of software.
As I was debugging the script, I kept finding myself having to delete directories and even the java installation because when I introduce a fix and rerun the script, I have to wait for everything to download again and I have to worry about duplicate files messing up my current logic -which can probably be improved, but I'll go to the StackExchange Code Review site later.
At the moment, I would like to know what approaches there are to prevent commands -like downloading the JDK and running the JDK installer script all over again and others- from running again.
What kind of general approaches are out there for cases such as these?
For the JDK download and running the installer, I did think of simply checking for the existing of java on the system and if there is then bash would not not to run those commands.
However, there are other commands I do not want run and I do want to simply check, for example, the existence of certain files to prevent wget-ing them all over again and moving them -causing duplicates. (Should I maybe suck it up and do that anyway as that might be best practice?)
I did also think of perhaps, at each successful command, outputting like a 1 to a text file and mapping each line in that text file to the commands run in the script (like using an if statement to see if that command had a 1 or not in the text file) and if it was a 0, then the script would know only to run that command and never the 1s.
That sounded clunky to me and I am pretty sure that is not a good approach.

Jenkins Timeout because of long script execution

I have some Issues regarding Jenkins and running a Powershell Script within. Long Story short: the Script takes 8x longe execution time then running it manually (takes just a few minutes) on the Server(Slave).
Im wondering why?
In the script are functions which which invoke commands like & msbuild.exe or & svn commit. I found out that the script hangs up in those Lines where before metioned commands are executed. The result is, that Jenkins time out because the Script take that long. I could alter the Timeout threshold in the Jenkins Job Configuration but i dont think this is the solution for the problem
There are no error ouputs or any information why it takes that long and i do not have any further Idea for the reason. Maybe one of you could tell me, how Jenkins invokes internaly those commands.
This is what Jenkins does (Windows batch plugin):
powershell -File %WORKSPACE%\ScriptHead\DeployOrRelease.ps1
I've created my own Powershell CI Service before I found that Jenkins supports it's own such plugin. But in my implementation and in my current jobs configs we follow sample segregation principle rule: more is better better. I found that my CI Service works better when is separated in different steps (also in case of error it's a lot easy for a root cause analyse). The Single responsibility principle is also helpful here. So as in Jenkins we have pre- & post-, build and email steps as separate script. About
msbuild.exe
As far as I remember in my case there were issues related with the operations in FileSystem paths. So when script was divided/separated in different functions we had better performance (additional checks of params).
Use "divide and conquer" technique. You have two choices: modify your script so that will display what is doing and how much it takes for every step. Second option is to make smaller scripts to perform actions like:
get the code source,
compile/build the application,
run the test,
create a package,
send the package,
archive the logs
send notification.
The most problematic is usually the first step: To get the source code from GIT or SVN or Mercurial or whatever you have as version control system. Make sure this step is not embeded into your script.
During the job run, Jenkins capture the output and use AJAX to display the result in your browser. In the script make sure you flush standard output for every step or several steps. Some languages cache standard output so you can see the results only at the end.
Also you can create log files that can be helpful to archive and verify activity status for older runs. From my experience using Jenkins with more then 10 steps requires you to create a specialized application that can run multiple steps like "robot framework".

VBScript - How to know when complete?

I'm running a simple/single vbscript in Windows Scheduler to perform 13 individual file exports from our SalesForce app. The script runs as expected. Depending upon network traffic, the 13 exports take 3-5 minutes total to complete.
My intent was to run these exports serially, but vbscript seems happy to run them in parallel. SalesForce accommodates with no issue or complaint.
Upon successful completion of the Export, I run a second vbscript to import these results into another application (via an msaccess function). This second vbscript also provides the desired result.
Question: Is there any way to programatically determine when the Export script has completed, to permit me to safely kick-off the Import script? Currently I have setup a 2nd Scheduler job to run the Import script 10 minutes after the separate Export script...but this could fail. I am looking to tie these two script more closely to one another.
Any suggestions?
Thanks!
There are a couple of options. If both scripts are running on the same system with the same permissions, you could have the first script actually kick off the second script whenever it's finished.
If the scripts require different permissions, or you need them to start from a task manager, have your first script start by looking for an existing file such as SCRIPT1.COMPLETE. If that file exists, have script1 delete the file and start processing. When script1 finishes it's processing, create that file. Then in script2, create a while loop that looks for SCRIPT1.COMPLETE. If the file is not there, hold off for a few seconds then try again. Don't exit the while loop until the complete file shows up. Have script2 delete the COMPLETE file when it finishes processing. I would recommend setting your "wait a while" function to at least 30 seconds or so, that way your script isn't just constantly checking.

jenkins started with all jobs lost, trying to use 'copy existing job' feature

the CI server was disconnected for a while for some strange reason from the network and when it came back up, jenkins displayed with no jobs. however in the directory where the jobs live, /var/lib/jenkins/jobs/, the two jobs that should appear are there, but don't show any evidence of existence in the web client.
i tried using the 'copy existing job' and then pointed it to /var/lib/jenkins/jobs/existing_test but it tells me: no such job /var/lib/jenkins/jobs/existing_test
any suggestions as to how to get this to work ?
I know that question can be outdated, but a possible a solution is to run jenkins under appropriate user (the one it run previously). This helped me.
ended up just building the jobs brand new, wasn't able to find a fix
At first I would try and look in the jenkins logs, as your data is in /var/lib/jenkins I would guess your log files are in /var/log/jenkins. Maybe you can find out whats wrong from there.
Also you could try the "load configuration from disk" link in the "manage jenkins" view. That should try to reload the configuration files from your directories, and maybe bring your jobs back. Anyways, you should be able to see something in your logs. If the logs are empty check file permissions, I used to have problems with that after updating sometimes.

Changing environment in Hudson, that stays for the whole build

how can I execute a batch-file or just some (e.g. twice) commands in a job of Hudson (running on windows xp, as a non-service, but may change), that the environment just stays for the whole build.
I need to do this, because I have to change the current path with 'cd' (we are using relative paths in our proj) and 'set' some environment-variables for msbuild.
Thank you in Advance.
Not sure why you need to get out of the service realm. My understanding was so far that Hudson starts a new environment for every job, so that the jobs don't interfere with each other. So if you don't use commands that effect other ennvironments (e.g. subst) you will be fine with adding a "Execute Windows Batch Command".
If your service runs with the wrong permissions, you have two options. First, change the permission of the service (run it under a different user than the local system user) or call the runas command. If for whatever reason you still need to contain changes to certain parts of your job you can always call cmd to create a new environment.

Resources