ducttape sometimes-skip task: cross-product error - bash

I'm trying a variant of sometimes-skip tasks for ducttape, based on the tutorial here:
http://nschneid.github.io/ducttape-crash-course/tutorial5.html
([ducttape][1] is a Bash/Scala based workflow management tool.)
I'm trying to do a cross-product to execute task1 on "clean" data and "dirty" data. The idea is to traverse the same path, but without preprocessing in some cases. To do this, I need to do a cross-product of tasks.
task cleanup < in=(Dirty: a=data/a b=data/b) > out {
prefix=$(cat $in)
echo "$prefix-clean" > $out
}
global {
data=(Data: dirty=(Dirty: a=data/a b=data/b) clean=(Clean: a=$out#cleanup b=$out#cleanup))
}
task task1 < in=$data > out
{
cat $in > $out
}
plan FinalTasks {
reach task1 via (Dirty: *) * (Data: *) * (Clean: *)
}
Here is the execution plan. I would expect 6 tasks, but I have two duplicate tasks being executed.
$ ducttape skip.tape
ducttape 0.3
by Jonathan Clark
Loading workflow version history...
Have 7 previous workflow versions
Finding hyperpaths contained in plan...
Found 8 vertices implied by realization plan FinalTasks
Union of all planned vertices has size 8
Checking for completed tasks from versions 1 through 7...
Finding packages...
Found 0 packages
Checking for already built packages (if this takes a long time, consider switching to a local-disk git clone instead of a remote repository)...
Checking inputs...
Work plan (depth-first traversal):
RUN: /nfsmnt/hltfs0/data/nicruiz/slt/IWSLT13/analysis/workflow/tmp/./cleanup/Baseline.baseline (Dirty.a)
RUN: /nfsmnt/hltfs0/data/nicruiz/slt/IWSLT13/analysis/workflow/tmp/./cleanup/Dirty.b (Dirty.b)
RUN: /nfsmnt/hltfs0/data/nicruiz/slt/IWSLT13/analysis/workflow/tmp/./task1/Baseline.baseline (Data.dirty+Dirty.a)
RUN: /nfsmnt/hltfs0/data/nicruiz/slt/IWSLT13/analysis/workflow/tmp/./task1/Dirty.b (Data.dirty+Dirty.b)
RUN: /nfsmnt/hltfs0/data/nicruiz/slt/IWSLT13/analysis/workflow/tmp/./task1/Clean.b+Data.clean+Dirty.b (Clean.b+Data.clean+Dirty.b)
RUN: /nfsmnt/hltfs0/data/nicruiz/slt/IWSLT13/analysis/workflow/tmp/./task1/Data.clean+Dirty.b (Clean.a+Data.clean+Dirty.b)
RUN: /nfsmnt/hltfs0/data/nicruiz/slt/IWSLT13/analysis/workflow/tmp/./task1/Data.clean (Clean.a+Data.clean+Dirty.a)
RUN: /nfsmnt/hltfs0/data/nicruiz/slt/IWSLT13/analysis/workflow/tmp/./task1/Clean.b+Data.clean (Clean.b+Data.clean+Dirty.a)
Are you sure you want to run these 8 tasks? [y/n]
Removing the symlinks from the output below, my duplicates are here:
$ head task1/*/out
==> Baseline.baseline/out <==
1
==> Clean.b+Data.clean/out <==
1-clean
==> Data.clean/out <==
1-clean
==> Clean.b+Data.clean+Dirty.b/out <==
2-clean
==> Data.clean+Dirty.b/out <==
2-clean
==> Dirty.b/out <==
2
Could someone with experience with ducttape assist me in finding my cross-product problem?
[1]: https://github.com/jhclark/ducttape

So why do we have 4 realizations involving the branch point Clean at task1 instead of just two?
The answer to this question is that the in ducttape branch points are always propagated through all transitive dependencies of a task. So the branch point "Dirty" from the task "cleanup" is propagated through clean=(Clean: a=$out#cleanup b=$out#cleanup). At this point the variable "clean" contains the cross product of the original "Dirty" and the newly-introduced "Clean" branch point.
The minimal change to make is to change
clean=(Clean: a=$out#cleanup b=$out#cleanup)
to
clean=$out#cleanup
This would give you the desired number of realizations, but it's a bit confusing to use the branch point name "Dirty" just to control which input data set you're using -- with only this minimal change, the two realizations of the task "cleanup" would be (Dirty: a b).
It may make your workflow even more grokkable to refactor it like this:
global {
raw_data=(DataSet: a=data/a b=data/b)
}
task cleanup < in=$raw_data > out {
prefix=$(cat $in)
echo "$prefix-clean" > $out
}
global {
ready_data=(DoCleanup: no=$raw_data yes=$out#cleanup)
}
task task1 < in=$ready_data > out
{
cat $in > $out
}
plan FinalTasks {
reach task1 via (DataSet: *) * (DoCleanup: *)
}

Related

Keep workspace when switching stages in combination with agent none

I have a Jenkins pipeline where I want to first build my project (Stage A) and trigger an asynchronous long running external test process with the built artifacts. The external test process then resumes the Job using a callback. Afterwards (Stage B) performs some validations of the test results and attaches them to the job. I don't want to block an executor while the external test process is running so I came up with the following Jenkinsfile which mostly suites my needs:
#!/usr/bin/env groovy
pipeline {
agent none
stages {
stage('Stage A') {
agent { docker { image 'my-maven:0.0.17' } }
steps {
script {
sh "rm testfile.txt"
sh "echo ABCD > testfile.txt"
sh "cat testfile.txt"
}
}
}
stage('ContinueJob') {
agent none
input { message "The job will continue once the asynchronous operation has finished" }
steps { echo "Job has been continued" }
}
stage('Stage B') {
agent { docker { image 'my-maven:0.0.17' } }
steps {
script {
sh "cat testfile.txt"
def data = readFile(file: 'testfile.txt')
if (!data.contains("ABCD")) {
error("ABCD not found in testfile.txt")
}
}
}
}
}
}
However, depending on the load of the Jenkins or the time passed or some unknown other conditions, sometimes the files that I create in "Stage A" are no longer available in "Stage B". It seems that Jenkins switches to a different Docker node which causes the loss of workspace data, e.g. in the logs I can see:
[Pipeline] { (Stage A)
[Pipeline] node
Running on Docker3 in /var/opt/jenkins/workspace/TestJob
.....
[Pipeline] stage
[Pipeline] { (Stage B)
[Pipeline] node
Running on Docker2 in /var/opt/jenkins/workspace/TestJob
Whereas with a successful run, it keeps using e.g. node "Docker2" for both stages.
Note that I have also tried reuseNode true within the two docker sections but that didn't help either.
How can I tell Jenkins to keep my workspace files available?
As pointed out by the comment from #Patrice M. if the files are not that big (which is the case for me) stash/unstash are very useful to solve this problem. I have used this combination now since a couple of months and it has solved my issue.

Run specific part of Cypress test multiple times (not whole test)

Is it possible to run specific part of the test in Cypress over and over again without execution of whole test case? I got error in the second part of test case and first half of it takes 100s. It means I have to wait 100s every time to get to the point where the error occurs. I would like rerun test case just few steps before error occurs. So once again, my question is: Is it possible to do in Cypress? Thanks
Workaround #1
If you are using cucumber in cypress you can modify your scenario to a Scenario Outline that will execute Nth times with a scenario tag:
#runMe
Scenario Outline: Visit Google Page
Given that google page is displayed
Examples:
| nthRun |
| 1 |
| 2 |
| 3 |
| 4 |
| 100 |
After that run the test in the terminal by running through tags:
./node_modules/.bin/cypress-tags run -e TAGS='#runMe'
Reference: https://www.npmjs.com/package/cypress-cucumber-preprocessor?activeTab=versions#running-tagged-tests
Workaround #2
Cypress does have retry capability but it would only retry the scenario during failure. You can force your scenario to fail to retry it Nth times with a scenario tag:
In your cypress.json add the following configuration:
{
"retries": {
// Configure retry attempts for `cypress run`
// Default is 0
"runMode": 99,
// Configure retry attempts for `cypress open`
// Default is 0
"openMode": 99
}
}
Reference: https://docs.cypress.io/guides/guides/test-retries#How-It-Works
Next is In your feature file, add an unknown step definition on the last step of your scenario to make it fail:
#runMe
Scenario: Visit Google Page
Given that google page is displayed
And I am an uknown step
Then run the test through tags:
./node_modules/.bin/cypress-tags run -e TAGS='#runMe'
For a solution that doesn't require adding a change to the config file, you can pass retries as a param to specific tests that are known to be flakey for acceptable reasons.
https://docs.cypress.io/guides/guides/test-retries#Custom-Configurations
Meaning you can write (from docs)
describe('User bank accounts', {
retries: {
runMode: 2,
openMode: 1,
}
}, () => {
// The per-suite configuration is applied to each test
// If a test fails, it will be retried
it('allows a user to view their transactions', () => {
// ...
}
it('allows a user to edit their transactions', () => {
// ...
}
})```

Parallel execution 'mvn test' in Jenkins

I try to create jenkinsfile for parallel execution command mvn test with different arguments. On the first stage of jenkinsfile I create *.csv file where are will be future arguments for mvn test command. Also I don't know the quantity of parallel stages (it depends on first stage where I get data from DB). So, summarize it again. Logic:
First stage for getting data from DB over command mvn test (with args). On this test I save data into csv file.
In loop of jenkinsfile I read every string, parse it and get args for arallel execution mvn test (with args based on the parsed data).
Now it looks like this (only necessary fragments of jenkinsfile):
def buildProject = { a, b, c ->
node {
stage(a) {
catchError(buildResult: 'FAILURE', stageResult: 'FAILURE') {
sh "mvn -Dtest=test2 test -Darg1=${b} -Darg2=${c}"
}
}
}
}
stages {
stage('Preparation of file.csv') {
steps {
sh 'mvn -Dtest=test1 test'
}
}
stage('Parallel stage') {
steps {
script {
file = readFile "file.csv"
lines = file.readLines()
def branches = [:]
for(i = 0; i < lines.size(); i++) {
values = lines[i].split(';')
branches["${values[0]}"] = { buildProject(values[0], values[1], values[2]) }
}
parallel branches
}
}
}
}
So, which problems do I face now with?
I see in log following error:
[ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/Data/jenkins/workspace//#2)
I look at workspaces of Jenkins and see that there were created several empty(!!!) directories (quantity equals to quantity of parallel stages). And therefore mvn command could be executed because of absence of pom.xml and other files.
In branches the same data are saved on every iteration of loop and in 'stage(a)' I see the same title (but every iteration of loop has unique 'values[0]').
Can you help me with this issue?
Thank you in advance!
So, regarding this jenkins issue issues.jenkins.io/browse/JENKINS-50307 and workaround which could be found there, task could be closed!

Parallel items in Jenkins Declarative pipeline

I am working on setting up automated build and deploy jobs in jenkins
Right now I have a single stage with parallel tasks set up like this
stage ('Testing & documenting'){
steps {
parallel (
"PHPLOC" : {
echo "Running phploc"
sh "./src/vendor/phploc/phploc/phploc --exclude=./src/vendor --no-interaction --quiet --log-csv=./build/logs/loc.csv src tests"
},
"SLOC": {
echo "Running sloc"
sh "sloccount --duplicates --wide --details . > ./build/logs/sloccount.sc 2>/dev/null"
},
"CPD" : {
echo "Running copy-paste detection"
sh "./src/vendor/sebastian/phpcpd/phpcpd --fuzzy . --exclude src/vendor --log-pmd ./build/logs/phpcpd.xml || true"
},
"MD" : {
echo "Running mess detection on code"
sh "./src/vendor/phpmd/phpmd/src/bin/phpmd src xml phpmd_ruleset.xml --reportfile ./build/logs/phpmd_code.xml --exclude vendor,build --ignore-violations-on-exit --suffixes php"
},
"PHPUNIT" : {
echo "Running PHPUnit w/o code coverage"
sh "./src/vendor/phpunit/phpunit/phpunit --configuration phpunit-quick.xml"
}
)
}
}
after reading https://jenkins.io/blog/2018/07/02/whats-new-declarative-piepline-13x-sequential-stages/ I noticed that they use a different structure.
stage("Documenting and Testing") {
parallel {
stage("Documenting") {
agent any
stages {
stage("CPD") {
steps {
//CPD
}
}
stage("PMD") {
steps {
//PMD stuff
}
}
}
stage("Testing") {
agent any
stages {
stage("PHPUnit") {
steps {
//PHPUnit
}
}
}
}
I am not sure what the difference between these two approaches is
The first example running parallel inside the steps block was introduced by the earlier versions of the declarative pipeline. This had some shortcomings. For example, to run each parallel branch on a different agent, you need to use a node step, and if you do that, the output of the parallel branch won’t be available for post directives (at a stage or pipeline level). Basically the old parallel step required you to use Scripted Pipeline within a Declarative Pipeline.
The second example is a true declarative syntax introduced to overcome the shortcomings of the former. In addition, this particular example runs two serial stages within the parallel stage ‘Documenting’.
You can read this official blog to know more about the parallel directive https://jenkins.io/blog/2017/09/25/declarative-1/.

Only run task if another isn't UP-TO-DATE in gradle

I want to automatically add a serverRun task when doing functional tests in Gradle, so I add a dependency :
funcTestTask.dependsOn(serverRun)
Which results in the task running whether or not the funcTestTask even runs
:compile
:serverRun
:funcTestTask (and associate compile tasks... etc)
:serverStop
OR
:compile UP-TO-DATE
:serverRun <-- unnecessary
:funcTestTask UP-TO-DATE
:serverStop
The cost of starting the server is pretty high and I only want it to start if the functionalTest isn't UP-TO-DATE, I'd like to do or something :
if(!funcTestTask.isUpToDate) {
funcTestTask.dependsOn(serverRun)
}
So I know I can't know the up-to-date status of funcTestTask until all it's inputs/outputs are decided BUT can I inherit it's uptoDate checker?
serverRun.outputs.upToDateWhen(funcTestTask.upToDate)
The alternative is to "doFirst" the ServerRun in the FuncTest, which I believe is generally frowned upon?
funcTestTask.doFirst { serverRun.execute() }
Is there a way to conditionally run a task before another?
UPDATE 1
Tried settings inputs/outputs the same
serverRun.inputs.files(funcTestTask.inputs.files)
serverRun.outputs.files(funcTestTask.outputs.files)
and this seems to rerun the server on recompiles (good), skips reruns after successful unchanged functional tests (also good), but wont rerun tests after a failed test like the following
:compile
:serverRun
:funcTestTask FAILED
then
:compile UP-TO-DATE
:serverRun UP-TO-DATE <-- wrong!
:funcTestTask FAILED
Having faced the same problem I found a very clean solution. In my case I want an eclipse project setup to be generated when the build is run, but only at the times when a new jar is generated. No project setup should be executed when the jar is up to date. Here is how one can accomplish that:
tasks.eclipse {
onlyIf {
!jar.state.upToDate
}
}
build {
dependsOn tasks.eclipse
}
Since the task is a dependent tasks of the one you are trying to control then you can try:
tasks {
onlyIf {
dependsOnTaskDidWork()
}
}
I ended up writing to a 'failure file' and making that an input on the serverRun task:
File serverTrigger = project.file("${buildDir}/trigger")
project.gradle.taskGraph.whenReady { TaskExecutionGraph taskGraph ->
// make the serverRun task have the same inputs/outputs + extra trigger
serverRun.inputs.files(funcTestTask.inputs.files, serverTrigger)
serverRun.outputs.files(funcTestTask.outputs.files)
}
project.gradle.taskGraph.afterTask { Task task, TaskState state ->
if (task.name == "funcTestTask" && state.failure) {
serverRun.trigger << new Date()
}
}
With information from an answer to my question on the Gradle forums :
http://forums.gradle.org/gradle/topics/how-can-i-start-a-server-conditionally-before-a-functionaltestrun
I had the same problem but the solution I came up is much simpler.
This starts up the server only if testing is necessary
test {
doFirst {
exec {
executable = 'docker/_ci/run.sh'
args = ['--start']
}
}
doLast {
exec {
executable = 'docker/_ci/run.sh'
args = ['--stop']
}
}
}
Assuming you have task1 and task2 which depends on task1 and you need to run task2 only if task1 is not up-to-date, the following example may be used:
task task1 {
// task1 definition
}
task task2(dependsOn: task1) {
onlyIf { task1.didWork }
}
In this case task2 will run only when task1 is not up-to-date. It's important to use didWork only for tasks which are defined in dependsOn, in order to ensure that didWork is evaluated after that task (task1 in our example) had chance to run.

Resources