What is the difference between
vars_files directive
and
include_vars module
Which should be used when, is any of above deprecated, discouraged?
Both vars_files and include_vars are labeled as stable interfaces so neither of them are deprecated. Both of them have some commonalities but they solve different purposes.
vars_files:
vars_file directive can only be used when defining a play to specify variable files. The variables from those files are included in the playbook. Since it is used in the start of the play, it most likely implies that some other play(before this play) created those vars files or they were created statically before running the configuration; means they were kind of configuration variables for the play.
include_vars:
vars_files serves one purpose of including the vars from a set of files but it cannot handle the cases if
the vars files were created dynamically and you want to include them in play
include vars in a limited scope.
You have multiple vars files and you want to include them based on certain criteria e.g. if the local database exists then include configuration of local database otherwise include configuration of a remotely hosted database.
include_vars have higher priority than vars_files so, it can be used to override default configuration(vars).
include_vars are evaluated lazily(evaluated at the time when they are used).
You want to include vars dynamically using a loop.
You want to read a file and put all those variables in a named dictionary instead of reading all variables in the global variable namespace.
You want to include all files in a directory or some subset of files in a directory(based on prefix or exclusion list) without knowing the exact names of the vars file(s).
These are some of the cases which I can think of and if you want to use any of the above cases then you need include_vars.
vars_files are read when the play starts. include_vars are read when the play reaches the task. You probably might be interested also in Variable precedence: Where should I put a variable?
Some updated information...
import_* is considered static reuse, vars_files is a type of
import.
include_* is considered dynamic reuse.
As Vladimir mentioned,
vars_files are read when the play starts. include_vars are read when the play reaches the task
Like all static items, vars_files are read before the play starts. Unlike include_vars, which are "included" when the play reaches it.
One of the biggest differences between static reuse and dynamic reuse is how the variables or tasks within them are processed. All static reuse items are processed with the linear strategy by default, all host stay in lockstep with each other. Each tasks has to complete on ALL hosts before the next task can begin. Hosts that are skipped actually get a noop task to process.
Dynamic reuse does not change the performance strategy from linear, however it does change the order the tasks are processed. With dynamic reuse, the entire group of tasks must complete on a single host before they are process by the next host. Unfortunately, all the other hosts get to twiddle their noops while they wait.
Include statements are good when you need to 'loop' a host through a series of tasks with registered outputs and do something with that information before the before the next host starts.
Import statements are good when you need to collect information or perform a task on a group of hosts before the next task can start for any host.
Here is a really good table that compares all the different Include_* and Import_* functions.Comparing includes and imports: dynamic and static re-use
Just as an FYI, here is a link to more information about performance strategies and how you can improve performance. How can I improve performance for network playbooks?
Related
What is the difference between
vars_files directive
and
include_vars module
Which should be used when, is any of above deprecated, discouraged?
Both vars_files and include_vars are labeled as stable interfaces so neither of them are deprecated. Both of them have some commonalities but they solve different purposes.
vars_files:
vars_file directive can only be used when defining a play to specify variable files. The variables from those files are included in the playbook. Since it is used in the start of the play, it most likely implies that some other play(before this play) created those vars files or they were created statically before running the configuration; means they were kind of configuration variables for the play.
include_vars:
vars_files serves one purpose of including the vars from a set of files but it cannot handle the cases if
the vars files were created dynamically and you want to include them in play
include vars in a limited scope.
You have multiple vars files and you want to include them based on certain criteria e.g. if the local database exists then include configuration of local database otherwise include configuration of a remotely hosted database.
include_vars have higher priority than vars_files so, it can be used to override default configuration(vars).
include_vars are evaluated lazily(evaluated at the time when they are used).
You want to include vars dynamically using a loop.
You want to read a file and put all those variables in a named dictionary instead of reading all variables in the global variable namespace.
You want to include all files in a directory or some subset of files in a directory(based on prefix or exclusion list) without knowing the exact names of the vars file(s).
These are some of the cases which I can think of and if you want to use any of the above cases then you need include_vars.
vars_files are read when the play starts. include_vars are read when the play reaches the task. You probably might be interested also in Variable precedence: Where should I put a variable?
Some updated information...
import_* is considered static reuse, vars_files is a type of
import.
include_* is considered dynamic reuse.
As Vladimir mentioned,
vars_files are read when the play starts. include_vars are read when the play reaches the task
Like all static items, vars_files are read before the play starts. Unlike include_vars, which are "included" when the play reaches it.
One of the biggest differences between static reuse and dynamic reuse is how the variables or tasks within them are processed. All static reuse items are processed with the linear strategy by default, all host stay in lockstep with each other. Each tasks has to complete on ALL hosts before the next task can begin. Hosts that are skipped actually get a noop task to process.
Dynamic reuse does not change the performance strategy from linear, however it does change the order the tasks are processed. With dynamic reuse, the entire group of tasks must complete on a single host before they are process by the next host. Unfortunately, all the other hosts get to twiddle their noops while they wait.
Include statements are good when you need to 'loop' a host through a series of tasks with registered outputs and do something with that information before the before the next host starts.
Import statements are good when you need to collect information or perform a task on a group of hosts before the next task can start for any host.
Here is a really good table that compares all the different Include_* and Import_* functions.Comparing includes and imports: dynamic and static re-use
Just as an FYI, here is a link to more information about performance strategies and how you can improve performance. How can I improve performance for network playbooks?
I've just started learning Ansible and have a question about what defines a "play". If I'm looking at a temple, my understanding is a play typically starts with the following:
-
name: xxx
hosts: yyy
I've read that the name: keyword is not required (but highly recommend). So with that being the case, how can I tell where one play begins and where the next one starts? What keywords delimit/demarc a single play when there are multiple plays in a playbook?
Thanks,
Andy
Plays:
A playbook is a list of plays. A play is minimally a mapping between a
set of hosts selected by a host specifier (usually chosen by groups
but sometimes by hostname globs) and the tasks which run on those
hosts to define the role that those systems will perform. There can be
one or many plays in a playbook.
Play refers to the set (one or more) of actions (tasks) you want to execute on a set (one of more) of hosts.
As for the syntax of a Play, you can find examples for single and multiple play playbook here.
Thanks for that link franklinsjo. That cleared things up. In summary, here's the takeaway:
A play consists of:
- name: (optional, but recommended)
hosts:
tasks:
So, as you're looking through a playbook, anytime you come to the keywords hosts: and tasks:, that indicates the start of a play (along with the optional name: keyword).
Thanks,
Andy
This a common case, but it doesn't seem straight-forward in Ansible.
Let's assume of a hierarchy of groups:
linux-hosts:
application-hosts:
foobar-application-hosts:
foobar01
Now for each of these groups we want to define a set of cron jobs.
For linux-hosts, jobs that run on all linux hosts.
For application-hosts, jobs that run on only application hosts.
For foobar-applciation-hosts, jobs that run on only foobar-applcation-hosts.
The variable name is cronjobs, say, and it's a list of cron module settings.
By default, the foobar-application-hosts would clobber the setting for anything above it. Not good.
I don't see an easy way to merge (on a specific level). So I thought, all right, perhaps Ansible exposes the individual group variables for the groups a host belongs to during a run. There is groups, and there is group_names, but I don't see a groupvars corresponding to hostvars.
This seems to imply I either use to some mix-and-match of cycling over groups, dynamically importing vars (if possible), and doing the merge myself. Perhaps putting some of this in a role. But this feels like such a hack. Is there another approach?
Groups in the Ansible sense is a "tag" on hosts. Hosts can belong to more than one group. So, the conjobs var should be a list, with the same length as the number of groups that the host is in.
I've joined a project which has a large number of playbooks and roles, and which makes heavy use of include (often in a nested fashion) in order to include playbooks/roles within existing playbooks/roles. (Whether this is good or bad practice should be considered out of scope of this question, because it's not something I can immediately change. Note also that include_role is not used because these playbooks were written well before 2.2 was out, and are still in the process of being updated.)
Normally when running ansible-playbook, the output just shows each task being run, but it does not show the includes which pull in extra tasks. This makes it hard to how the overall flow jumps around between playbooks. In contrast, include_vars tasks are included in the output. I'm guessing this is because it's an Ansible module, whereas include isn't really a module.
So without having to modify the playbooks, is there way to run playbooks which shows the following?
when include directives are triggering, and
(ideally) also the exact files which are being included, since it's not always obvious how relative paths are converted into absolute paths
I've found lots of advice on various ways to debug playbooks, but nothing which achieves this. Bonus points if it also shows when roles are being included via meta role dependencies!
I'm aware that there are tools such as ansigenome which do static analysis of playbooks, but I'm hoping for something which can output the information at playbook run-time, for any playbook I choose to invoke.
If it's not currently possible, would it be a reasonable feature request?
Try executing ansible-playbook -vv, it shows "task path" for every executed task, like this:
TASK [debug] *********************************************
task path: /path/to/task/file.yml:5
ok: [localhost] => {
"msg": "aaa"
}
So you can easily track actual file (included or not) path and line number.
As for includes, there are different type of includes in current Ansible versions (2.2, 2.3): static and dynamic.
Static includes happen during parse time and information about them is printed (with -vv verbosity) at the very beginning of playbook run.
Dynamic includes happen in runtime and you can see cyan "included" lines in the output.
After building my final output file with Gradle I want to do 2 things. Update a local version.properties file and copy the final output final to some specific directory for archiving. Let's assume I already have 2 methods implemented that do exactly what I just described, updateVersionProperties() and archiveOutputFile().
I'm know wondering what's the best way to do this...
Alternative A:
assembleRelease.doLast {
updateVersionProperties()
archiveOutputFile()
}
Alternative B:
task myBuildTask(dependsOn: assembleRelease) << {
updateVersionProperties()
archiveOutputFile()
}
And here I would call myBuildTask instead of assembleRelease as in alternative A.
Which one is the recommended way of doing this and why? Is there any advantage of one over the other? Would like some clarification please... :)
Whenever you can, model new activities as separate tasks. (In your case, you might add two more tasks.) This has many advantages:
Better feedback as to which activity is currently executing or failed
Ability to declare task inputs and outputs (reaping all benefits that come from this)
Ability to reuse existing task types
More possibilities for Gradle to execute tasks in parallel
Etc.
Sometimes it isn't easily possible to model an activity as a separate task. (One example is when it's necessary to post-process the outputs of an existing task in-place. Doing this in a separate task would result in the original task never being up-to-date on subsequent runs.) Only then the activity should be attached to an existing task with doLast.