This a common case, but it doesn't seem straight-forward in Ansible.
Let's assume of a hierarchy of groups:
linux-hosts:
application-hosts:
foobar-application-hosts:
foobar01
Now for each of these groups we want to define a set of cron jobs.
For linux-hosts, jobs that run on all linux hosts.
For application-hosts, jobs that run on only application hosts.
For foobar-applciation-hosts, jobs that run on only foobar-applcation-hosts.
The variable name is cronjobs, say, and it's a list of cron module settings.
By default, the foobar-application-hosts would clobber the setting for anything above it. Not good.
I don't see an easy way to merge (on a specific level). So I thought, all right, perhaps Ansible exposes the individual group variables for the groups a host belongs to during a run. There is groups, and there is group_names, but I don't see a groupvars corresponding to hostvars.
This seems to imply I either use to some mix-and-match of cycling over groups, dynamically importing vars (if possible), and doing the merge myself. Perhaps putting some of this in a role. But this feels like such a hack. Is there another approach?
Groups in the Ansible sense is a "tag" on hosts. Hosts can belong to more than one group. So, the conjobs var should be a list, with the same length as the number of groups that the host is in.
Related
Right now, I am creating files to make unterminating variables. But I'm curious if there's a simpler way to create variables that don't terminate.
I find Redis invaluable for persisting data like this. It is a quick and lightweight installation and allows you to store many types of data:
strings, including complete JSONs and binary data like JPEG/PNG/TIFF images - also with TTL (Time-to-Live) so data can be expired when no longer needed
numbers, including atomic integers, floats
lists/queues/stacks
hashes (like Python dictionaries)
sets, and sorted (ordered) sets
streams, bitfields, geospatial data and esoteric hyperlogs
PUB/SUB is also possible, where one or more machines/processes publish items and multiple consumers, who have subscribed to that topic, receive the published items.
It can also perform very fast operations on your data for you, like set intersections and unions, getting lengths of lists, moving items between lists, atomically adding/subtracting from numbers and so on.
You can also use it to pass data between processes, sub-processes, shell scripts, parent and child, child and parent (!) scripts and so on.
In addition to all that, it is networked, so you can set variables on one computer and read/alter them from another - very simply. For example, you can PUSH jobs to a queue, potentially from multiple machines, and run workers on multiple machines that wait for jobs on the queue, process them and return results to another list.
There is a discussion of the things you can store here.
Example: Store a string, then retrieve it:
redis-cli SET name fred
name=$(redis-cli GET name)
Example: Increment views of page 2 by 10, and then retrieve from different machine on network:
redis-cli INCRBY views:page:2 10
views=$(redis-cli -h 192.168.0.10 GET views:page:2)
Example: Push a value onto a list:
redis-cli LPUSH shoppingList bananas
Example: Blocking wait for next item in list - use RPOP for non-blocking:
item=$(redis-cli BRPOP shoppingList)
Also, there are bindings for Python, C/C++, Java, Ruby, PHP etc. So you can "inject" dummy/test data into, or extract debug data from a running Python program using the redis-cli tool even on a different computer.
Use environment variables to store your data.
ABC="abc"; export ABC
And the other question is, how to make environment variables persistents after reboot.
Depending on your shell, you may have different file to persist the veriables.
if using bash, run this command containing the variable's last value before reboot.
echo 'export ABD="hello"' >> $HOME/.bashrc
I think this is a good time to be using an SQL Database. It's more scalable and functional than having a fileful of "persistent variables".
It may require a little more setup, and I admit it isn't "simpler" per say, but it will probably be worth it in the long run. You will be able to do things with your variables and that may make your future scripts simpler.
I recommend going to YouTube and find a simple instruction on how to set up a local MySQL or MSSQL. There is a guy, Mike Dane, who makes really beginner-friendly instructions. Try searching "GiraffeAcademy SQL Beginner" and see if that helps you.
What is the difference between
vars_files directive
and
include_vars module
Which should be used when, is any of above deprecated, discouraged?
Both vars_files and include_vars are labeled as stable interfaces so neither of them are deprecated. Both of them have some commonalities but they solve different purposes.
vars_files:
vars_file directive can only be used when defining a play to specify variable files. The variables from those files are included in the playbook. Since it is used in the start of the play, it most likely implies that some other play(before this play) created those vars files or they were created statically before running the configuration; means they were kind of configuration variables for the play.
include_vars:
vars_files serves one purpose of including the vars from a set of files but it cannot handle the cases if
the vars files were created dynamically and you want to include them in play
include vars in a limited scope.
You have multiple vars files and you want to include them based on certain criteria e.g. if the local database exists then include configuration of local database otherwise include configuration of a remotely hosted database.
include_vars have higher priority than vars_files so, it can be used to override default configuration(vars).
include_vars are evaluated lazily(evaluated at the time when they are used).
You want to include vars dynamically using a loop.
You want to read a file and put all those variables in a named dictionary instead of reading all variables in the global variable namespace.
You want to include all files in a directory or some subset of files in a directory(based on prefix or exclusion list) without knowing the exact names of the vars file(s).
These are some of the cases which I can think of and if you want to use any of the above cases then you need include_vars.
vars_files are read when the play starts. include_vars are read when the play reaches the task. You probably might be interested also in Variable precedence: Where should I put a variable?
Some updated information...
import_* is considered static reuse, vars_files is a type of
import.
include_* is considered dynamic reuse.
As Vladimir mentioned,
vars_files are read when the play starts. include_vars are read when the play reaches the task
Like all static items, vars_files are read before the play starts. Unlike include_vars, which are "included" when the play reaches it.
One of the biggest differences between static reuse and dynamic reuse is how the variables or tasks within them are processed. All static reuse items are processed with the linear strategy by default, all host stay in lockstep with each other. Each tasks has to complete on ALL hosts before the next task can begin. Hosts that are skipped actually get a noop task to process.
Dynamic reuse does not change the performance strategy from linear, however it does change the order the tasks are processed. With dynamic reuse, the entire group of tasks must complete on a single host before they are process by the next host. Unfortunately, all the other hosts get to twiddle their noops while they wait.
Include statements are good when you need to 'loop' a host through a series of tasks with registered outputs and do something with that information before the before the next host starts.
Import statements are good when you need to collect information or perform a task on a group of hosts before the next task can start for any host.
Here is a really good table that compares all the different Include_* and Import_* functions.Comparing includes and imports: dynamic and static re-use
Just as an FYI, here is a link to more information about performance strategies and how you can improve performance. How can I improve performance for network playbooks?
What is the difference between
vars_files directive
and
include_vars module
Which should be used when, is any of above deprecated, discouraged?
Both vars_files and include_vars are labeled as stable interfaces so neither of them are deprecated. Both of them have some commonalities but they solve different purposes.
vars_files:
vars_file directive can only be used when defining a play to specify variable files. The variables from those files are included in the playbook. Since it is used in the start of the play, it most likely implies that some other play(before this play) created those vars files or they were created statically before running the configuration; means they were kind of configuration variables for the play.
include_vars:
vars_files serves one purpose of including the vars from a set of files but it cannot handle the cases if
the vars files were created dynamically and you want to include them in play
include vars in a limited scope.
You have multiple vars files and you want to include them based on certain criteria e.g. if the local database exists then include configuration of local database otherwise include configuration of a remotely hosted database.
include_vars have higher priority than vars_files so, it can be used to override default configuration(vars).
include_vars are evaluated lazily(evaluated at the time when they are used).
You want to include vars dynamically using a loop.
You want to read a file and put all those variables in a named dictionary instead of reading all variables in the global variable namespace.
You want to include all files in a directory or some subset of files in a directory(based on prefix or exclusion list) without knowing the exact names of the vars file(s).
These are some of the cases which I can think of and if you want to use any of the above cases then you need include_vars.
vars_files are read when the play starts. include_vars are read when the play reaches the task. You probably might be interested also in Variable precedence: Where should I put a variable?
Some updated information...
import_* is considered static reuse, vars_files is a type of
import.
include_* is considered dynamic reuse.
As Vladimir mentioned,
vars_files are read when the play starts. include_vars are read when the play reaches the task
Like all static items, vars_files are read before the play starts. Unlike include_vars, which are "included" when the play reaches it.
One of the biggest differences between static reuse and dynamic reuse is how the variables or tasks within them are processed. All static reuse items are processed with the linear strategy by default, all host stay in lockstep with each other. Each tasks has to complete on ALL hosts before the next task can begin. Hosts that are skipped actually get a noop task to process.
Dynamic reuse does not change the performance strategy from linear, however it does change the order the tasks are processed. With dynamic reuse, the entire group of tasks must complete on a single host before they are process by the next host. Unfortunately, all the other hosts get to twiddle their noops while they wait.
Include statements are good when you need to 'loop' a host through a series of tasks with registered outputs and do something with that information before the before the next host starts.
Import statements are good when you need to collect information or perform a task on a group of hosts before the next task can start for any host.
Here is a really good table that compares all the different Include_* and Import_* functions.Comparing includes and imports: dynamic and static re-use
Just as an FYI, here is a link to more information about performance strategies and how you can improve performance. How can I improve performance for network playbooks?
I've joined a project which has a large number of playbooks and roles, and which makes heavy use of include (often in a nested fashion) in order to include playbooks/roles within existing playbooks/roles. (Whether this is good or bad practice should be considered out of scope of this question, because it's not something I can immediately change. Note also that include_role is not used because these playbooks were written well before 2.2 was out, and are still in the process of being updated.)
Normally when running ansible-playbook, the output just shows each task being run, but it does not show the includes which pull in extra tasks. This makes it hard to how the overall flow jumps around between playbooks. In contrast, include_vars tasks are included in the output. I'm guessing this is because it's an Ansible module, whereas include isn't really a module.
So without having to modify the playbooks, is there way to run playbooks which shows the following?
when include directives are triggering, and
(ideally) also the exact files which are being included, since it's not always obvious how relative paths are converted into absolute paths
I've found lots of advice on various ways to debug playbooks, but nothing which achieves this. Bonus points if it also shows when roles are being included via meta role dependencies!
I'm aware that there are tools such as ansigenome which do static analysis of playbooks, but I'm hoping for something which can output the information at playbook run-time, for any playbook I choose to invoke.
If it's not currently possible, would it be a reasonable feature request?
Try executing ansible-playbook -vv, it shows "task path" for every executed task, like this:
TASK [debug] *********************************************
task path: /path/to/task/file.yml:5
ok: [localhost] => {
"msg": "aaa"
}
So you can easily track actual file (included or not) path and line number.
As for includes, there are different type of includes in current Ansible versions (2.2, 2.3): static and dynamic.
Static includes happen during parse time and information about them is printed (with -vv verbosity) at the very beginning of playbook run.
Dynamic includes happen in runtime and you can see cyan "included" lines in the output.
In the docs there is an option for a system group. What exactly is a system group? I couldn't find this detail anywhere.
If yes, indicates that the group created is a system group.
“System groups” are usually lower numbered than non-system, commonly 0-99. There’s a little relevant info in the groupadd(8) man page:
-r, --system
Create a system group.
The numeric identifiers of new system groups are chosen in the
SYS_GID_MIN-SYS_GID_MAX range, defined in login.defs, instead of
GID_MIN-GID_MAX.
Example groups are http, dbus, wheel, mail.
More details in this Q/A.