ExitCode of RunProgramInGuest in Jenkins job - windows

I'm running a batch file in virtual machine by jenkins job. I using following command to run it.
..path..\vmrun.exe -T ws -gu username -gp password runProgramInGuest "c:\vm_image.vmx" -activeWindow -interactive "C:\Installer.bat"
The job is running correctly and installing software (by run batch file).
But sometime it is exiting with exit code 2.
So jenkins is showing as job failed.
Shall I know what is the exit code 2 mean in this job?
What are other possible exit code for this command and there meanings?
How shall I find whether job passed or failed?

If I understood what you ran, it's:
0 – VIX_OK
The operation was successful.
1 – VIX_E_FAIL
Unknown error.
2 – VIX_E_OUT_OF_MEMORY
Memory allocation failed: out of memory.

Related

Snakemake does not recognise job failure due to timeout with error code -11

Does anyone had a problem snakemake recognizing a timed-out job. I submit jobs to a cluster using qsub with a time-out set per rule:
snakemake --jobs 29 -k -p --latency-wait 60 --use-envmodules \
--cluster "qsub -l walltime={resources.walltime},nodes=1:ppn={threads},mem={resources.mem_mb}mb"
If a job fails within a script, the next one in line will be executed. When a job however hits the time-out defined in a rule, the next job in line is not executed, reducing the total number of jobs run in parallel on the cluster over time. A timed-out job raises according to the MOAB scheduler (PBS server) a -11 exit status. As far as I understood any non-zero exit status means failure - or does this only apply to positive integers?!
Thanks in advance for any hint:)
If you don't provide a --cluster-status script, snakemake internally checks job status by touching some hidden files in the submitted job script. When a job times out, snakemake (on the node) doesn't get a chance to report the failure to the main snakemake instance as qsub will kill it.
You can try a cluster profile or just grab a suitable cluster status file (be sure to chmod it as an exe and have qsub report a parsable job id).

Impossible to run the program with the "mpirun" command

I perfectly builded a program with cygwin, however when i've to run the .exe file with the command "mpirun" as the tutorial of the program says
https://github.com/jalombar/starsmasher/blob/master/documentation/walkthroughs/star_star_flyby.md
It appears the following error:
$ mpirun -np 4 test_cpu_sph
[Francyrad:00524] PMIX ERROR: INIT in file /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.5-1.x86_64/src/openmpi-3.1.5/opal/mca/pmix/pmix2x/pmix/src/mca/gds/ds21/gds_ds21_lock_pthread.c at line 188
[Francyrad:00524] PMIX ERROR: SUCCESS in file /cygdrive/d/cyg_pub/devel/openmpi/v3.1/openmpi-3.1.5-1.x86_64/src/openmpi-3.1.5/opal/mca/pmix/pmix2x/pmix/src/mca/common/dstore/dstore_base.c at line 2432
--------------------------------------------------------------------------
Open MPI tried to fork a new process via the "execve" system call but
failed. Open MPI checks many things before attempting to launch a
child process, but nothing is perfect. This error may be indicative
of another problem on the target host, or even something as silly as
having specified a directory for your application. Your job will now
abort.
Local host: Francyrad
Application name: /cygdrive/c/Users/Francyrad/Desktop/starsmasher/GAM1.667_N1.5
Error: /cygdrive/c/Users/Francyrad/Desktop/starsmasher/GAM1.667_N1.5/test_cpu_sph
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an
error:
Error name:
Node: (null)
when attempting to start process rank 34361314336.
--------------------------------------------------------------------------
4 total processes failed to start
[Francyrad:00524] 3 more processes have sent help message help-orte-odls-default.txt / execve error
[Francyrad:00524] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
I tried everything, to change the syntaxis and other, but anything! Ive no idea how to make run this application. What the hell i've to do????
Hum, the executable seem to be the problem...
Does your executable has been well built ? Do you see it with a "ls" command ?
And did you try to call the executable with mpirun using "mpirun -np 4 ./test_cpu_sph" instead of "mpirun -np 4 test_cpu_sph" ?
And could you give a bit more information please ? (if you're using cygwin, I guess that you're running on Windows ?)

What are the different ways to check if the mapreduce program ran successfully

If we need to automate a mapreduce program or run from a script, what are the different ways to check if the mapreduce program ran successfully? One way is to find is if _SUCCESS file is created in the output directory. Does the command "hadoop jar program.jar hdfs:/input.txt hdfs:/output" return 0 or 1 based on success or failure ?
Just like any other command in Linux, you can check the exit status of a
hadoop jar command using the built in variable $?.
You can use:
echo $?
after executing the hadoop jar command to check its status.
The exit status value varies from 0 to 255. An exit status of zero implies that the command executed successfully while a non-zero value indicates that the command failed.
Edit: To see how to achieve automation or to run from a script, refer Hadoop job fails when invoked by cron.

Matlab bad performance under Jenkins

I have a Jenkins script execution step which processes out-data with Matlab to perform evaluation of test results.
When running the script from command prompt it starts up and exits quite fast but when executing the same script with the same arguments from Jenkins it performs extremely por. I get the Matlab welcome message in the "prompt only" window that appears but nothing else within the timeout of 2 hours that I have set for the job.
Have disabled the Jenkins Windows service on node and are running the node-process from desktop but no difference:
C:\Windows\System32\java.exe -jar c:\j-mpc\slave.jar -jnlpUrl http://<server>/slave-agent.jnlp -secret <xxxxx>
Also tried to increase the memory for the node process in but no change:
C:\Windows\System32\java.exe -Xmx2048m
When killing the process-tree starting with bash it indicates that it is inherited from java.exe-sh.exe tree (Pocess Explorer window) but there is a missing PID in between:
java.exe (<0.01%, 1 420 000K)
sh.exe (<0.01%, 2 140K)
bash.exe (<0.01%, 2 580K)
bash.exe ( , 2 580K)
python.exe ( , 6 044K)
python.exe ( , 4 800K)
matlab.exe ( , 1 844K)
MATLAB.exe (<0.01%, 167 324K)
Is there a hidden limitation in child processes that limits the memory or process usage when called from Jenkins, in other jobs I don't see the same limitations. Memory allocation for Matlab is very slow (from start to reasonable size >100M takes about a minute)
(Have a screen dump from Process Explorer but I am not allowed to upload)
EDIT
I have also tried to limit the call to a single windows command line from Jenkins with the same result (suspected that the deep call stack was to blame for it) but same result.
matlab.exe -nodisplay -nosplash -nodesktop -wait -logfile "log_file.txt" -r "try script_file ;catch err; disp(err.message); end ; exit"
Solved by setting the LM_LICENSE_FILE environment variable in Jenkins node setup.
(found a thread about slow startup)
Apparently the shell environment started by Jenkins does not completely comply with the one started from explorer.

Running remotely Linux script from Windows and get execution result code

I have the current scenario to deal with:
I have to schedule the backup of my company's Linux-based server (under Suse Linux) with ARCServe R15 (installed on Windows 2003R2SP2).
I know I have the ability in my backup software (ARCServe) to add pre/post execution scripts to my backup-jobs.
If failure of the script, ARCServe would be specified NOT to run the backup-job, and if success, specified to be run. I have no problem with this.
The problem is, I want to make a windows script (to be launched by ARCServe) for executing a Linux script on the cluster:
- If this Linux-script fails, I want my windows-script to fail, so my backup job in ARCServe wouldn't run
- If the Linux-script success, I want my windows-script to end normally with error code 0, so my ARCServe job would run normally.
I've tried creating this batch file (let's call it HPC.bat):
echo ON
start /wait "C:\Program Files\PUTTY\plink.exe" -v -l root -i "C:\IST\admin\scripts\HPC\pri.ppk" [cluster_name] /appli/admin/backup_admin
exit %errorlevel%
If I manually launch this .bat by double-clicking on it, or launching it in a command prompt under Windows, it executes normally and then ends.
If I make it being launched by ARCServe, the script seems never to end.
My job stays in "waiting" status, it seems the execution code of the linux script isn't returned to my batch file, and this one doesn't close.
In my mind, what's happening is plink just opens the connection to the Linux, send the sript execution signal, and then close the connection, so the execution code can't be returned to the batch. Am I right ?
Is what I want to do possible or am I trying something impossible to do ?
So, do I have to proceed differently ?
Do I have to use PUTTY or CygWin instead of plink ?
Please, it's giving me headaches ...
If you install Cygwin, you could do it exactly like you can do it on Linux to Linux, i.e. remotely run a command with ssh someuser#remoteserver.com somecommand
This command will return with the same return code on the calling client, as the command exited with on the remote end. If you use SSH shared keys for authentication instead of passwords, it can also be scripted without user interaction.

Resources