How to exclude instances of the EC2 inventory in Ansible? - amazon-ec2

We have an Ansible server using EC2 dynamic inventory:
https://github.com/ansible/ansible/blob/devel/contrib/inventory/ec2.py
https://github.com/ansible/ansible/blob/devel/contrib/inventory/ec2.ini
However, with the number of instances we have, running ./ec2.py --list or ./ec2.py --refresh-cache returns a 28,000 line JSON response.
This I assume, causes it to randomly fail (returns a Python stack trace) as it only receives a partial response when sending a call to AWS, but is then fine if ran again.
Which is why I want to know if there's a way to cut this down.
I know there is a way to include specific instances by tag in the ec2.ini (i.e. # instance_filters = tag:env=staging), but with
the way our instances are tagged, is there a way to exclude
instances instead (something that would look similar to: # instance_filters = tag:name=!dev)?

is there a way to exclude instances instead
Just for completeness, I wanted to point out that the "inventory protocol" for ansible is super straightforward to implement, and they even have a JSON Schema for it.
You can see an example of the output it is expecting by running the newly included ansible-inventory script with --list to see the output it generates from one of the .ini style inventories, and then use that to emit your own:
$ printf 'somehost ansible_user=bob\n\n[some_group]\nsomehost\n' > sample
$ ansible-inventory -i ./sample --list
What I am suggesting is that you might have better luck making a custom inventory script, that does know your local business practices, rather than trying to force ec2.py into running a negation query (which, as best I can tell, it will not do).
To generate dynamic inventory, just make an executable -- as far as I know it can be in any language at all -- and then point the -i at the executable script instead of a "normal" file. Ansible will invoke that program, and operate on the JSON output as the inventory. There are several examples people have posted as gists, in all kinds of languages.
I would still love it if you would file an issue with ansible about ec2.py, because you have the situation that can make the bug report concrete for them in ways that a simple "it doesn't work for large inventories" doesn't capture. But in the mean time, writing your own inventory provider is actually less work than it sounds.

I use the option pattern_exclude in ec2.ini:
# If you want to exclude any hosts that match a certain regular expression
pattern_exclude = staging-*
and
hostname_variable = tag_Name

Related

How can I auto generate inputs.tf and outputs.tf variables when working with Terraform?

Note: Please see the #### UPDATE ### section below. I've heavily modified the question for clarity on what I'm trying to achieve, but added it as an addendum rather than rewrite the question.
As my infrastructure grows, adding input variables in my variables.tf files and then syncing those values to output variables in my outputs.tf file is now impossible to do manually. Not only is it taking up a lot of unnecessary time, probably more time is spent going back and fixing the ones that terraform validate told me that I missed by human error. This is especially true when building / using modules whose arguments add an additional layer to manage.
There has to be a better way? Here is what I want to achieve.
Let's say I'm creating an Azure AKS Kubernetes cluster. The Terraform resource is azurerm_kubernetes_cluster.
Only 8 arguments are required to create a base install, but there are almost 250 additional ones. They all have default values. Per the documentation page, they also already have fantastic descriptions. (I'm tired of copying and pasting into my variables { description = "this"} block.)
The information is there in the documentation. terraform plan also has knowledge of every single additional one because it of course comes up in the pre-apply plan. (known after apply) means its optional, but will have a default value.
In my dream world, I'd run this hypothetical command sequence:
terraform plan
terraform document <- Here it auto generates every argument as a variables block and inserts it into variables.tf. It also auto generates every possible output "out_putable" {} block and inserts it into outputs.tf.
terraform apply -update-inputs -update-outputs <- Here everything that was optional (known after apply) is now known and it should auto update variables.tf and outputs.tf accordingly. Adding a -update-modules flag lets it take care of that additional layer introduced by using modules.
This feels like a problem that has been addressed before. Before I write a custom tool that parses Terraform web docs and the output of terraform show, is there already a way to do this? Terraform-docs is the closest I've come to finding a solution for README.md. If it can do what I need, I haven't figure it out yet.
How can I automate all this?
############
UPDATE
############
This article and video is spot-on when it comes to Terraform's evolution in an organization. My organization is somewhere between late-stage pattern 3 and early 5. As we decompose our "Terralith" we have inconsistencies among teams (patterns, naming conventions, variable and argument choices etc). These are starting to cause errors in CI/CD forcing a ticket-review process that is slowing things down.
All resources have required and optional arguments. But in my organization, we have, for example, additional optional arguments that are required for us.
Scenario: Dev A in Japan creates a resource, forgets an optional variable or two or names them something obscure, etc. Dev B in America is blocked until they can convene and discuss. Given time zones, language differences, ticket review, this one issue is now a week or more delayed.
I need to automate this and create exact consistency so that Dev A starts out with exactly what Dev B would start with or is expecting; and, what CI/CD tests are expecting - templating the initial process, if you will. In other words, I need to remove the human element of manually creating main.tf, variables.tf, outputs.tf, etc.
Here are thoughts on how to achieve this:
Use Golang to autogenerate the files by querying the API
How can I query the API to get a list of all required arguments for a specific resource?
I found that I can query for provider information, but I can't find info to retrieve resource information. My thinking is when a developer wants to create a new resource, He'll run a go or typescript to generate the manifest files along with expected naming conventions, and populate main.tf, variables.tf, outputs.tf, etc, with exactly what data that everyone is expecting. I'm looking form something like curl registry.terraform.io/providers/hashicorp/azurerm/v2.99/resource_group?required=yes This should show me all required arguments along with descriptions and other info I can use straight from the API.
Use CDKTF to generate an HCL manifest.tf file from JSON
How can I use CDKTF to generate an HCL .tf file?
CDKTF is EXACTLY what I'm looking for - except in reverse. HCL is seamlessly compatible with JSON. Running cdktf synth creates ./out/cdk.tf.out I'm so close! How do I turn that file into main.tf?!?
The goal here is to have a master file from which all future manifest files are derived. Whether we use azurerm_kubernetes_cluster 1 time or 1000 times, I know for certain that every argument, every variable name, every desired output is exactly the same. If a chance is needed in our desired structure, it will be updated at the JSON level, and CI/CD can ensure those changes are propagated across instances of its use.
I know that I can use the cdk.out.tf file as a drop in replacement for a module, but I don't want my team members to have to learn typescript or how to read json. If I can create a templatized JSON file containing exactly what I'm expecting users to start with, and if they can run some command like cdktf convert cdk.tf.out --HCL output-file.tf then I've accomplished my goal.
If cdktf synth can create an HCL JSON file, and cdktf convert can take a manifest.tf file and turn it into HCL JSON, can't it do the exact opposite? Turn the HCL JSON file into the human-readable, declarative, manifest.tf file?
Perhaps think of it this way. Terraform has a required file structure for a module if it's to be allowed into the module registry. I'm trying to create a similar required structure for each of the resources our organization uses regardless of when and where it's used.
If your goal is to derive input variables and output values from resource type schemas then Terraform can provide you with the information to do so.
In the working directory of a configuration that already uses the provider whose resource type you want to use, run the following command:
terraform providers schema -json
The result contains a JSON description of all of the resource types available in the providers for the current configuration, and for each one the metadata about its attributes, including the type constraint information and descriptions for each one.
From that you can generate whatever other files you need based on that information.
Note that if you are intending to build modules which export the entire surface area (all inputs and all outputs) of a particular resource type the Terraform documentation explicitly recommends against this, suggesting to just use the resource type directly instead since such a module would often not offer sufficient benefit to outweigh the additional complexity and maintenance overhead it implies:
In principle any combination of resources and other constructs can be factored out into a module, but over-using modules can make your overall Terraform configuration harder to understand and maintain, so we recommend moderation.
A good module should raise the level of abstraction by describing a new concept in your architecture that is constructed from resource types offered by providers.
For example, aws_instance and aws_elb are both resource types belonging to the AWS provider. You might use a module to represent the higher-level concept "HashiCorp Consul cluster running in AWS" which happens to be constructed from these and other AWS provider resources.
We do not recommend writing modules that are just thin wrappers around single other resource types. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead.
I've got the same question and develop a small bash script to create output definitions based on module code
This code required the hcledit tool to extract blocks from hcl code
#!/usr/bin/env bash
set -o pipefail
_hcledit=$(which hcledit)
for tf_file in $(ls *.tf); do
cat $tf_file | $_hcledit block list | while read line; do
block_type="${line%%.*}"
line="${line#*.}"
case $block_type in
locals|output|variable|data) continue; break ;;
module)
output_name=$line
output_description="Module '$output_name' attributes"
output_value="$block_type.$output_name"
;;
resource)
label_kind="${line%.*}"
label_name="${line#*.}"
output_name="${label_kind}_${label_name//[\-]/_}"
output_description="Resource '$label_kind.$label_name' attributes"
output_value="$label_kind.$label_name"
;;
esac
cat <<-EOT
output "$output_name" {
description = "$output_description"
value = $output_value
}
EOT
done
done

Ansible - include one playbook into another in loop

I am new to ansible and trying to figure out how can I call one playbook from another playbook in loop. I also want to consume the output back in master playbook. Not sure if it could be possible in Ansible.
Below is a stub from other programming languages -
masterplaybook.yml - from where I want to invoke auditplaybook
for devicePair in devicePairList
output = auditdevice.yml -e "d1=devicePair.A d2=devicePair.B"
save/process output
auditdevice.yml playbook is using d1 and d2 as hosts on which it is performing auditing, running commands etc. It is performing audit on dynamic inventory passed as part of argument.
Is it possible to achieve above using Ansible? If yes, can someone point to any example?
Q: "How can I call one playbook from another playbook in the loop?"
A: It is not possible. Quoting from import_playbook
"You cannot use this action inside a play."
See the example.
FWIW. ansible-runner is able to controll playbooks withing projects similar to AWX. See example.

aws_ec2 inventory plugins: Creating host variables based on AWS tags

Is there a way to create host variables based on the (AWS) tags on an instance? (without manually listing each variable and tag under the compose section)
The idea is basically to allow the equivalent of the INI-style inventory file using tags from AWS. (With the relevant variables using a prefix to identify them)
e.g. If an instance has these tags: (The array syntax is assumed from what works in the INI-style inventory and other options are acceptable)
ansible_swapfile_size=4G
ansible_extra_packages=["htop","iotop"]
I want these variables to be configured:
swapfile_size: 4G
extra_packages:
- htop
- iotop
compose seems to be capable of doing it if I list a fixed set of variables, but that means that if a new variable are added, the inventory would need to be edited. (There does not seem to be a way to use a loop to build the keys that is set)
Is there a way to dynamically generate variables based on tags (in the inventory)?
This tries to do the same, but the solution require editing the playbook. I want the logic in the inventory - otherwise the playbook eventually ends up with a hack for every different inventory source.

Get Ansible play/task/config level keyword/directives help/reference when in command line. Is it possible?

I'm looking for equivalent of ansible-doc command to help with play/task syntax/keywords.
When I use Ansible from non graphical host, there is a very handy command line tool ansible-doc for help with options/keywords on module level. However if I need to check task level syntax keywords, let's say: I forget how to write loops I have to look it up in Ansible online documentation. (Well, loop is hard to forget but it can be something less common and obvious like serial keyword). Is there command line tool to achieve this? Or maybe way to obtain Ansible online documentation in compact and readable from command line format?

How to loop over playbook include?

(I'm currently running Ansible 2.1)
I have a playbook that gathers a list of elements and I have another playbook (that calls different hosts and whatnot) using said element as the basis for most operations. Therefore, whenever I use with_items over the playbook, it causes an error.
The loop control section of the docs say that "In 2.0 you are again able to use with_ loops and task includes (but not playbook includes) ". Is there a workaround? I really need to be able to call multiple hosts in an included playbook that runs over a set of entries. Any workarounds, ideas for such or anything are greatly appreciated!
P.S. I could technically command: ansible-playbook but I dont want to go down that rabbit hole if necessary
I think I faced same issues, and by the way, migrating to shows more than in 'item' already in use.
refering to http://docs.ansible.com/ansible/playbooks_best_practices.html , you should have an inventory (that contains all your hosts), and a master playbook (even if theorical).
A good way, instead of including playbooks, is to design roles, even if empty. Try to find a "common" role for everything that could be applied to most of your hosts.Then, include additional roles depending of usage, this will permit you to trigg on correct hosts.
You can also have roles that do nothing (meaning, nothing in 'tasks'), but that contain set of variables that can be common for two roles (you avoid then duplicate entries).

Resources