Tiered Parallel Execution w/ Failure Recovery in Jenkins - continuous-integration

In the figure below, I want each level of jobs to run in parallel (as many as they can simultaneously on executors), and IF one arbitrary job fails, after fixing the problem I want the things to run normal again (as if the job didn't fail). I mean if the failed job is build successfully after fixing, I want the jobs at lower levels to start automatically.
I have seen that Build Flow Plugin cannot realize that. I hope someone has some brilliant ideas to share.
Thanks for your time.
For Further Clarification:
All the jobs at level x must be successful before all the jobs at level x+1. If some job at level x fails, I do not want any job at level x+1 to start. After fixing the the problem, re-run the job, and if it succeeds (and all the other at level x also have succeeded), then I want level x+1 to start building.

Referencing your diagram, I'll restate the requirements of your question (to make sure I understand it).
At Level 1, you want all of the jobs to run in parallel (if possible)
At Level 2, you want all of the jobs to run in parallel
At Level 3, you want all of the jobs to run in parallel
Any successful build of a Level 1 job should cause all Level 2 jobs to build
Any successful build of a Level 2 job should cause all Level 3 jobs to build
My answer should work if you do not require "Any failure at Level 1 will prevent all Level 2 jobs from running."
I don't believe this will require any other plugins. It just uses what is built into Jenkins.
For each Level 1 job, configure a Post Build action of "Build other projects"
The projects to build should be all of your Level 2 jobs. (The list should be comma separated list.)
Check "Trigger only if build succeeds"
For each Level 2 job, configure a Post Build action of "Build other projects"
The projects to build should be all of your Level 3 jobs.

Related

Modifying number of tasks executed on mesos slave

In a Mesos ecosystem(master + scheduler + slave), with the master executing tasks on the slaves, is there a configuration that allows modifying number of tasks executed on each slave?
Say for example, currently a mesos master runs 4 tasks on one of the slaves(each task is using 1 cpu). Now, we have 4 slaves(4 cores each) and except for this one slave the other three are not being used.
So, instead of this execution scenario, I'd rather prefer the master running 1 task on each of the 4 slaves.
I found this stackoverflow question and these configurations relevant to this case, but still not clear on how to use the --isolation=VALUE or --resources=VALUE configuration here.
Thanks for the help!
Was able to reduce number of tasks being executed on a single host at a time by adding the following properties to startup script for mesos agent.
--resources="cpus:<<value>>" and --cgroups_enable_cfs=true.
This however does not take care of the concurrent scheduling issue where the requirement is to have each agent executing a task at the same time. For that need to look into the scheduler code as also suggested above.

TWS - what is the better way to schedule job to run twice a day...?

I need to schedule job to run twice a day everyday i.e. first run at 1 AM and second run at 11 PM. As you can see there is no way I can use option "repeat range".
So my question is what is the better way to do it:
1. set two run cycles (DAILY and DAILY2), check the "use as time dependency" and set the earliest start for each
2. create two identical jobs (with different names) and set the time restriction on the job level (under one run cycle DAILY)
It seems like there is no difference, but in TWS you never know so what do you recommend? Thanks
I would use 2 different runcycles with different start times defined at runcycle level.
The use of "use as time dependency" is optional in this scenario, this is up to you if you want to not start before 1 AM / 11 PM or if this is just to create 2 instances with 2 different schedTime and resolve the dependencies according to these schedTimes.

How can you run more than one simultaneous job in Ansible Tower?

It seems that all jobs are enqueued, and only one will run at a time. How can we run more than one?
Tower is designed to parallelize jobs, but there are a couple of cases where it will not.
If you have your inventory or SCM set up to "update on launch" with no cache or the cahche has expired, then any additional jobs will be stuck pending behind the inventory or SCM update. The inventory and SCM will not update until after the currently running job is done.
If you are trying to run multiple jobs against the same host: Tower will not run multiple jobs against the same host at the same time in order to avoid race conditions. (localhost is a possible exception). If you need multiple jobs to run against the same host at the same time then you need to create two inventories and put that host in both inventories, running the two jobs against different inventories. In this situation, Tower does not know that you are running against the same host.
Jobs which share the same Inventory or SCM source can not run at the same time.
Suppose you have a job comprised of three tasks:
task 1: "do x", task 2: "do y", task 3: "do z"
With ansible "do x" will run on all the servers, then "do y" will run on all the servers, then "do z" will run on all the servers.
Also, I said "all serves" but in fact it maxes out at the ansible "forks" value, which defaults to 5. In my 100 server enviroment I set this value to 20. more on this here: http://docs.ansible.com/intro_configuration.html#forks
Remember the strength of ansible is doing a job ( a collection of tasks) on many machines at the same time. If what you want is to run the same task many times on a single machine, then you want something like fork, or parallel.
In fact Ansible will try to run "do x" as many times as it can across many machines. You can adjust this behavior having the whole job run on a portion of machines before it gets started on more machines with the "serial" keyword (http://docs.ansible.com/playbooks_delegation.html#rolling-update-batch-size).
Not the subtle difference between forks, and serial.
forks is "per task"
serial is "per job" ( collection of tasks )
David Thornton
Edit:I re-read your question. This is about running more than one job at a time, not running more than on task in a job. So I think you are correct for ansible-awx but not for the command line. Via the web interface you can submit a job to the job queue, but you can't make ansible-awx run more than one task at a time. I think. However via command line, if you open more than one window you can run multiple ansible-playbooks at the same time. Do you have an ansible support account? Those guys are great IMHO, they have taken a lot of time to answer my questions ( like your question ).
Simultaneous jobs can be executed from Tower. Job templates have "Enable Concurrent Jobs" option. See section "15.4. Job Concurrency" at http://docs.ansible.com/ansible-tower/latest/html/userguide/jobs.html.
If i have 3 different tasks on a single server running its called synchronous mode management, 3 tasks will be assigned to a single job ID , and each tasks executes one after the other were it consumes lots of time.
In Ansible version later than 2.5 we can get 3 job ID for 3 different tasks , and start executing at a same time were we can save a huge time.This type is called asynchronous mode.

Matrix-Job: Build whole matrix on same node

I'm using several matrix-jobs which usually contain the following steps:
Build, Install, Test
The Build-Step is set as a Touchdown-Step. The other steps are using the binaries created by Build.
I recently added another node to my system which should build these matrix-jobs too. Now my problem is, that Jenkins is distributing the steps of my job to these nodes.
Example:
1. Slave A runs the `Build` step and succeeds
2. Slave B runs the `Install` step and fails due to its dependency on the `Build`-results.
3. Slave A runs the `Test` step and succeeds, cause the dependencies are existing.
The execution of the matrix-job fails, cause its steps are distributed.
My question now is if there is any way to bind the execution of a matrix-job to just one node. It's no problem if different executions are done on different nodes, but the steps of a certain execution should be done on a certain node.
It is no solution to bind the matrix-job to just one node. It still should be bound to a group of nodes.
Since you have these steps as individual jobs, in your "label" axis:
make sure you choose individual nodes, not labels, for each of your steps.
This will make sure that each of your steps runs on each individual slave, and therefore each step will have it's predecessor's workspace.
See:
http://imagebin.org/163627
==========================================================================
Based on comments:
At this point, you have two options:
You can use : https://wiki.jenkins-ci.org/display/JENKINS/Copy+Artifact+Plugin. You can add everything required as an artifact of your "build" step, and have your "install" step copy them over using the plugin. Do the same for "install" => "test".
Combine your steps into a single job, since there is no guarantee that the same node will be "least used" for each step if they are different jobs. The only way to force all jobs to use the same node is by selecting individual node and not label.
Hope that helps...

How to share the BUILD_NUMBER between jobs with Hudson

I have separated a big Hudson job into smaller jobs. Job A does the main build and Job B does another build with a different configuration. I have configured Hudson, so that the A triggers B and that works fine, the problem is that Job A has the original build number and B has started from 1.
My question is: Is it possible to pass the BUILD_NUMBER environment variable somehow from Job A to Job B? The build number is used in the build artifact names, hence it would be nice to have the numbers match between artifacts.
Thanks.
Use the parametrized Parameterized Trigger Plugin, which will allow you to pass the build number from A to B. You will not be able to actually set the build number in job B, but you will have the build number from A to generate your version number.
If you want to synchronize the build number, you can edit the file nextBuildNumber in the job directory to match the number from job A. Be aware that these numbers will drift apart over the time since when A fails B will not be started.
EDIT I just stumbled across the Next Build Number Plugin. Have a look, If this one helps you.

Resources