Any way to print the task graph before Gradle actually performs the task? - gradle

Not exactly a newb but am trying to dig a big deeper and understand a bit better ... just thought it might be a nice idea to do this.

Thanks to Opal I did this:
gradle.taskGraph.whenReady {taskGraph ->
println "Tasks"
taskGraph.getAllTasks().eachWithIndex{ task, n ->
println "${n + 1} $task"
task.dependsOn.eachWithIndex{ depObj, m ->
println " ${ m + 1 } $depObj"
}
}
}
Output for a Java build:
Tasks
1 task ':compileJava'
2 task ':processResources'
3 task ':classes'
1 compileJava
2 dirs
3 processResources
4 task ':jar'
5 task ':assemble'
1 org.gradle.api.internal.artifacts.DefaultPublishArtifactSet$ArtifactsTaskDependency#48db7705
6 task ':compileTestJava'
7 task ':processTestResources'
8 task ':testClasses'
1 processTestResources
2 dirs
3 compileTestJava
9 task ':test'
10 task ':check'
1 value: task ':test'
11 task ':build'
1 check
2 assemble
For me, as a Gradle neophyte (one up from a newb), this is great! Although it leaves me slightly puzzled:
1) "build" depends only on "check" and "assemble", and these have 1 dependency each, each with no dependencies. So how does it know to run all the other tasks (which obviously it has to)... I must be missing something.
2) what is dependency "dirs" and "org.gradle.api.internal.artifacts.DefaultPublishArtifactSet$ArtifactsTaskDependency#48db7705"? More importantly, where does these actually come from? getDependsOn() returns a Set<Object> so these may be something other than Tasks.
Plenty to investigate...

The gradle-taskinfo plugin probably does just what Opal describes in his answer.
I wrote it, so I might be biased, but I use it a lot and it works great for me. 😊
Maybe it saves you some time coding!

Interface TaskExecutionGraph provides getAllTasks method which (from the docs):
Returns the tasks which are included in the execution plan. The tasks are returned in the order that they will be executed.
It seems that the output of this method might be used to print the task graph.

Related

How to run 1 playbook for the same group by multiple plays aka threaded

Current setup that we do have ~2000 servers (in 1 group)
I would like to know if there is a way to run x.yml on all the group (where all the 2k servers are in ) but with multiple plays (threaded , or something)
ansible-playbook -i prod.ini -l my_group[50%] x.yml
ansible-playbook -i prod.ini -l my_group[other 50%] x.yml
solutions with awx or ansible-tower are not relevant.
using even 500-1000 forks didn't gave any improvement
try to combine forks, and the free strategy.
the default behavior of Ansible is:
Ansible runs each task on all hosts affected by a play before starting the next task on any host, using 5 forks.
So event if your increase the forks number, the tasks on special forks will still wait any host finish to go ahead. The free strategy allows each host to run until the end of the play as fast as it can
- hosts: all
strategy: free
tasks:
# ...
ansible-playbook -i prod.ini -f 500 -l my_group x.yml
As mentioned above, you should preferably increase fork and set the strategy to free. Increasing fork will help you run the playbook on more server and setting the strategy to free would allow you to run a task for servers independently without waiting for others.
Please refer to below doc for more clarifaction.
docs
resolved by using patterns my_group[:1000] and my_group[999:]
forks didnt give any time decrease in my case.
also free strategy did multiplied the time which was pretty weird.
also debugging free strategy summary is free difficult when u have 2k servers and about 50 tasks in playbook .
thanks everyone for sharing
much appreciated

How to set parallelism flag conditionally in circleci config.yml?

I have an e2e job setup in circleci. I would like to set the parallelism property to either 1 or 4 depending on a specific parameter. But I am not able to understand how would I do this in terms of syntax. I tried this but does not work:
e2e:
when:
condition: << parameters.test_parameter >>
parallelism: 4
unless:
condition: << parameters.test_parameter >>
parallelism: 1
The above does not seem to be right. The build fails showing syntax error at parallelism level saying mapping values are not allowed. Any clue of how to achieve this in circleci? TIA

Taskwarrior: How do I find the tasks that depend on a specific tasks?

How do I find out which task(s) depend on a specific task without reading the information of all tasks?
Reproduction
System
Version
$ task --version
2.5.1
.taskrc
# Taskwarrior program configuration file.
# Files
data.location=~/.task
alias.cal=calendar
rc.date.iso=Y-M-D
default.command=ready
journal.info=no
rc.regex=on
Here are the tasks that I created for testing purposes:
$ task list
ID Age Description Urg
1 2min Something to do 0
2 1min first do this 0
3 1min do this whenever you feel like it 0
3 tasks
Create the dependency from task#1 to task#2:
$ task 1 modify depends:2
Modifying task 1 'something to do'.
Modified 1 task.
$ task list
ID Age D Description Urg
2 4min first do this 8
3 4min do this whenever you feel like it 0
1 4min D Something to do -5
3 tasks
Goal
Now I want to find the tasks that are dependent on task#2, which should be task#1.
Trials
Unfortunately, this does not result in any matches:
$ task list depends:2
No matches.
$ # I can filter by blocked tasks
$ task blocked
ID Deps Age Description
1 2 18min Something to do
1 task
$ # But when I want to only have tasks \
that are blocked by task#2 also task#3 is returned
$ task blocked:2
[task ready ( blocked:2 )]
ID Age Description Urg
2 20min first do this 8
3 19min do this whenever you feel like it 0
2 tasks
Suggestions?
How would you approach this?
Parsing the taskwarrior output through a script looks like a bit of an overkill.
You have the right command but have actually encountered a bug: the depends attribute does not work with "short id", it's a comma-delimited string of uuids.
It will work if you use UUID instead. Use task <id> _uuid to resolve id to UUID.
$ task --version
2.5.1
# Create tasks
$ task rc.data.location: add -- Something to do
$ task rc.data.location: add -- first do this
$ task rc.data.location: add -- do this whenever you feel like it
$ task rc.data.location: list
ID Age Description Urg
1 - Something to do 1.8
2 - first do this 1.8
3 - do this whenever you feel like it 1.8
3 tasks
# Set up dependency
$ task rc.data.location: 1 modify depends:2
Modifying task 1 'Something to do'.
Modified 1 task.
# Query using depends:UUID
$ task rc.data.location: list "depends.has:$(task rc.data.location: _get 2.uuid)"
ID Age D Description Urg
1 - D Something to do -3.2
1 task
# Query using depends:SHORT ID
# This does not work, despite documentation. Likely a bug
$ task rc.data.location: list "depends.has:$(task rc.data.location: _get 2.id)"
No matches.
Small correction with your trial to find blocked tasks
There is no blocked attribute and you're using the ready report.
$ task blocked:2
[task ready ( blocked:2 )]
The ready report will filter out what we're looking for, the blocked report is what we need. To unmagickify this, these are simply useful default reports that have preset filters on top of task all.
$ task show filter | grep -e 'blocked' -e 'ready'
report.blocked.filter status:pending +BLOCKED
report.ready.filter +READY
report.unblocked.filter status:pending -BLOCKED
Blocked tasks will have the virtual tag +BLOCKED, which is mutually exclusive to +READY.
The blocked attribute doesn't exist, use task _columns to show available attributes (e.g. depends). Unfortunately, the CLI parser is probably attempting to apply the filter blocked:2 and ends up ignoring it. For your workflow, the useful command to use is task blocked "depends.has:$(task _get 2.uuid)". Advisable to write a shell function to make it easier to use:
#!/bin/bash
# Untested but gets the point across
function task_blocked {
blocker=$1
shift
task blocked depends.has:$(task _get ${blocker}.uuid) "$#"
}
# Find tasks of project "foo" that are blocked on task 2
task_blocked 2 project:foo
# What about other project that is also impacted
task_blocked 2 project:bar
You could use this taskwarrior hook script that adds a "blocks" attribute to the tasks: https://gist.github.com/wbsch/a2f7264c6302918dfb30

Dependency and condition oder in azure DevOps Pipeline

In Azure pipeline yaml file, when defining multiple jobs in a single stage, one can specify dependencies between them. One can also specify the conditions under which each job runs.
Code #1
jobs:
- job: A
steps:
- script: echo hello
- job: B
dependsOn: A
condition: and(succeeded(), eq(variables['build.sourceBranch'], 'refs/heads/master'))
steps:
- script: echo this only runs for master
Code #2
jobs:
- job: A
steps:
- script: "echo ##vso[task.setvariable variable=skipsubsequent;isOutput=true]false"
name: printvar
- job: B
condition: and(succeeded(), ne(dependencies.A.outputs['printvar.skipsubsequent'], 'true'))
dependsOn: A
steps:
- script: echo hello from B
Question:
Code #1 & #2 above have different orders of the dependency and condition. Does the order matters? If so, what's matter? (what's the difference between different orders)
Discuss 1 and 2 separately.
Code #1:
Since there is no data connection between job1 and job2, data connection here refers to variable sharing and etc.
So, for #1, there's no matters on order. Here you can ignore the dependsOn specified while you have no special requirements on the execution order between job A and job B.
BUT, there's one key thing you need pay attention is, the actual running order will be changed randomly when you do not specify the dependsOn. For example, most of time, they will respect with the order job A, job B. Occasionally, they will randomly run as job B, job A.
Code #2:
This must make the dependsOn specified. Because your job B is using the output variable which created/generated at job A. Since our system allow same variables name exists in different jobs, you must specify the dependsOn so that the system can know the job B should find the variable skipsubsequent from job A not others. Only this key words specified, the variables which generated in job A can be exposed and available to next jobs.
So, the nutshell is once there is any data connection between jobs, e.g variable pass, you must specify dependsOn to make the jobs has connection with each other.

Parallel processing with dependencies on a SGE cluster

I'm doing some experiments on a computing cluster. My algorithm has two steps. The first one writes its outputs to some files which will be used by the second step. The dependecies are 1 to n meaning one step2 programs needs the output of n step1 program. I'm not sure what to do neither waist cluster resources nor keep the head node busy. My current solution is:
submit script (this runs on the head node)
for different params, p:
run step 1 with p
sleep some time based on the an estimate of how much step 1 takes
for different params, q:
run step 2 with q
step 2 algorithm (this runs on the computing nodes)
while files are not ready:
sleep a few minutes
do the step 2
Is there any better way to do this?
SGE provides both job dependencies and array jobs for that. You can submit your phase 1 computations an array job and then submit the phase 2 computation as a dependent job using the qsub -hold_jid <phase 1 job ID|name> .... This will make the phase 2 job wait until all the phase 1 computations have finished and then it will be released and dispatched. The phase 1 computations will run in parallel as long as there are enough slots in the cluster.
In a submission script it might be useful to specifiy holds by job name and name each array job in a unique way. E.g.
mkdir experiment_1; cd experiment_1
qsub -N phase1_001 -t 1-100 ./phase1
qsub -hold_jid phase1_001 -N phase2_001 ./phase2 q1
cd ..
mkdir experiment_2; cd experiment_2
qsub -N phase1_002 -t 1-42 ./phase1 parameter_file
qsub -hold_jid phase1_002 -N phase2_002 ./phase2 q2
cd ..
This will schedule 100 executions of the phase1 script as the array job phase1_001 and another 42 executions as the array job phase1_002. If there are 142 slots on the cluster, all 142 executions will run in parallel. Then one execution of the phase2 script will be dispatched after all tasks in the phase1_001 job have finished and one execution will be dispatched after all tasks in the phase1_002 job have finished. Again those can run in parallel.
Each taks in the array job will receive a unique $SGE_TASK_ID value ranging from 1 to 100 for the tasks in job phase1_001 and from 1 to 42 for the tasks in job phase1_002. From it you can compute the p parameter.

Resources