Improved way of scaling in saltstack - yaml

I have a problem about the Jinja2 template and that problem is breaking a one line string over multiple lines when it comes to writing a state or anything in salt [my exact case refers to trying to write a list of machines one after the other,in a list,instead of just in a really long line].
What I am trying to say is that I want to achieve this:
nodegroups:
- group: 'L#adsdasdadas' +
'dasdasdasdas'
.............->imagine 10.000 names coming here
'adsasdasddsa'
Compared to the approach that I have to do now:
nodegroups:
- group: 'L#adsdasdadas,dasdsadasdsa,dasdsadasdsa,......,asdqwe'
Is there a better way to do it?Is there a better way to handle thousands of machines?
You could say grains,and I thought about it but I was wondering if there's a better and elegant way of doing it.
Any thoughts or opinions would help me a lot
[Edit1]:
I wrote a script that takes a list of hostnames and adds them to the master config file in the nodegroups section.For now it might work

Choice of data source
I would recommend targeting with pillars because they are managed centrally from Master = convenient, rather than static custom grains (which are configured distributively on each Minion) = inconvenient - see comparison summary here.
Limitations of configuration files
The nodegroups are specified in Salt configuration file /etc/salt/master which is not a Jinja template (it has pure YAML format). So, you don't have an option to use Jinja to join external input with list of strings.
Possible solution
Why joining is even mentioned? You can turn the problem of "breaking a one line string over multiple lines" into solution of using lists right away - no need to break (and if you need "one line string" somewhere, joining list items is easy).
In other words, you could define nodegroups via pillar (avoiding long strings as in your example). Pillars, in turn, are rendered by Jinja. Therefore, using the same list of Minions defined somewhere, you could generate derived product in pillars through Jinja (be it joined string of them or list as is). There is a trick which allows reusing the same external data in multiple pillars files.

First of all I would like to thank uvsmtid for the wonderful idea.Sorry for the confusion created too
So,what I did was create a pillar with the name of each minion[which happens to be it's id] and then in a state what I did was compared the value from that list to the actual id of the minion
{%for item in salt['pillar.get']('info') %}
{%if grains['id'] == item %}
something:
cmd.run:
- name: touch something
{%endif%}
{%endfor%}
I hope this solution will help someone the same way it helped me

Related

What is the essential difference between Document and Collectiction in YAML syntax?

Warning: This question is a more philosophical question than practical, but I find it well as to be asked and answered in practical contexts (forums like StackOverflow here, instead of the SoftwareEngineering stack-exchange website), due to the native development in the actual use de-facto of YAML and the way the way it's specification has evolved and features have been added to it over time. Let's ask:
As opposed to formats/languages/protocols such as JSON, the YAML format allows you (according to this link, that seems pretty official, or at least accurate and reliable source to understand the YAML specification) to embed multiple 'Documents' within one file/stream, using the three-dashes marking ("---").
If so, it's hard to ignore the fact that the concept/model/idea of 'Document' in YAML, is no longer an external definition, or "meta"-directive that helps the human/parser to organize multiple/distincted documents along each other (similar to the way file-systems defining the concept of "file" to organize different files, but each file in itself - does not necessarily recognize that it's a file, or that it's being part of a file system that wraps it, by definition, AFAIK.
However, when YAML allows for a multi-Document YAML files, that gather collections of Documents in a single YAML file (and perhaps in a way that is similar/analogous to HTTP Pipelining approach of HTTP protocol), the concept/model/idea/goal of Document receives a new, wider definition/character de-facto, as a part of the YAML grammar and it's produces, and not just of the YAML specification as an assistive concept or format description that helps to describe the specification.
If so, being a Document part of the language itself, what is the added value of this data-structure, compared to the existing, familiar and well-used good old data-structure of Collection (array of items)?
I'm asking it, because I've seen in this link (here) some snippet (in the second example), which describes a YAML sequence that is actually a collection of logs. For some reason, the author of the example, chose to prefer to present each log as a separate "Document" (separated with three-dashes), gathered together in the same YAML sequence/file, instead of writing a file that has a "Collection" of logs represented with the data-type of array. Why did he choose to do this? Is his choice fit, correct, ideal?
I can speculate that the added value of the distinction between a Document and a Collection become relevant when using more advanced features of the YAML grammar, such as Anchors, Tags, References. I guess every Document provide a guarantee that all these identifiers will be a unique set, and there is no collision or duplicates among them. Am I right? And if so, is this the only advantage, or maybe there are any more justifications for the existence of these two pretty-similar data structures?
My best for now, is to see Document as a "meta"-Collection, that is more strict, and lack of high-level logic, or as two different layers of collection schemes. Is it correct, accurate way of view?
And even if I am right, why in the above example (of the logs document from the link), when there's no use and not imply or expected to use duplications or collisions or even identifiers/anchors or compound structures at all - the author is still choosing to represent the collection's items as separate documents? Is this just not so successful selection of an example? Or maybe I'm missing something, and this is a redundancy in the specification, or an evolving syntactic-sugar due to practical needs?
Because the example was written on a website that looks serious with official information written by professionals who dealt with the essence of the language and its definition, theory and philosophy behind (as opposed to practical uses in the wild), and also in light of other provided examples I have seen in it and the added value of them being meticulous, I prefer not to assume that the example is just simply imperfect/meticulous/fit, and that there may be a good reason to choose to write it this way over another, in the specific case exampled.
First, let's look at the technical difference between the list of documents in a YAML stream and a YAML sequence (which is a collection of ordered items). For this, I'll discuss YAML tags, which are an advanced feature so I'll provide a quick overview:
YAML nodes can have tags, such as !!str (the official tag for string values) or !dice (a local tag that can be interpreted by your application but is unknown to others). This applies to all nodes: Scalars, mappings and sequences. Nodes that do not have such a tag set in the source will be assigned the non-specific tag ?, except for quoted scalars which get ! instead. These non-specific tags are later resolved to specific tags, thereby defining to which kind of data structure the node will be deserialized into.
YAML implementations in scripting languages, such as PyYAML, usually only implement resolution by looking at the node's value. For example, a scalar node containing true will become a boolean value, 42 will become an integer, and droggeljug will become a string.
YAML implementations for languages with static types, however, do this differently. For example, assume you deserialize your YAML into a Java class
public class Config {
String name;
int count;
}
Assume the YAML is
name: 42
count: five
The 42 will become a String despite the fact that it looks like a number. Likewise, five will generate an error because it is not a number; it won't be deserialized into a string. This means that not the content of the node defines how it will be deserialized, but the path to the node.
What does this have to do with documents? Well, the YAML spec says:
Resolving the tag of a node must only depend on the following three parameters: (1) the non-specific tag of the node, (2) the path leading from the root to the node and (3) the content (and hence the kind) of the node.)
So, the technical difference is: If you put your data into a single document with a collection at the top, the YAML processor is allowed to take into account the position of the data in the top-level collection when resolving a tag. However, when you put your data in different documents, the YAML processor must not depend on the position of the document in the YAML stream for resolving the tag.
What does this mean in practice? It means that YAML documents are structurally disjoint from one another. Whether a YAML document is valid or not must not depend on any preceeding or succeeding documents. Consequentially, even when deserialization runs into a semantic problem (such as with the five above) in one document, a following document may still be deserialized successfully.
The goal of this design is to be able to concatenate arbitrary YAML documents together without altering their semantics: A middleware component may, without understanding the semantics of the YAML documents, collect multiple streams together or split up a single stream. As long as they are syntactically correct, stream splitting and merging are sound operations that do not invalidate a YAML document even if another document is structurally invalid.
This design primary focuses on sending and receiving data over networks. Of course, nowadays, YAML is primarily used as configuration language. This is why this feature is seldom used and of rather little importance.
Edit: (Reply to comment)
What about end-cases like a string-tagged Document starts with a folded-string, making even its following "---" and "..." just a characters of the global string?
That is not the case, see rules l-bare-document and c-forbidden. A line containing un-indented ... not followed by non-whitespace will always end a document if one is open.
Moreover, ... doesn't do anything if no document is open. This ensures that a stream merger can always append ... to a document to ensure that the current document is closed, but no additional one is created.
--- has widely been adopted as separator between YAML documents (and, perhaps more prominently, between YAML front matter and content in tools like Jekyll) where ... would have been more appropriate, particularly in Jekyll. This gives the false impression that --- should be used by tooling to separate documents, when in reality ... is the syntactic element designed for that use-case.

Am I misunderstanding something about using the List and Fetch combination?

I am trying to understand the combination of List and Fetch processors.
I have a directory with three JSON files and I get the ListAzureDataLakeStorage to list them. But when I connect a FetchAzureDataLakeStorage with which I intend to take only one of the files, the Fetch takes the same file three times. In summary, it takes the file whose azure.filename matches with the value that I put in the File Name property, but as many times as there are files in the listed directory.
I really want to use a single List and connect three Fetches to it, each one to take a different file, and thus use them for different streams.
In each Fetch I put in the "File Name" property the name of the file that I want to take. For example:
File Name: fileName1.json
I have also tried putting in "File Name" with Expression Language the following:
FileName: $ {azure.filename: equals ('fileName1.json')}. But this option causes a 404 empty body error.
But there is no way. Am I misunderstanding something about using the List and Fetch combination?
If you are statically entering file names and you want to respond to each one differently, then the ListX processors aren't very beneficial to your flow.
The easier option would be to use a GenerateFlowFile processor with the appropriate schedule to trigger a corresponding FetchX processor.
If you're only doing this for 3 files, it's not too much manual overhead. You could also achieve something similar using RouteOnContent/Attribute.

Append in host_vars to list from group_vars/defaults

Suppose I have a list configured in the role defaults (under roles/myrole/defaults/main.yml):
the_list:
- one
- two
And suppose that for a particular host I need to add also three to the list. Is it possible?
By default, the list is overriden, rather than concatenated. E.g. if I put into host_vars:
the_list:
- three
... then the resulting list will include just three, the other two elements will be lost.
Any way to merge the lists? Maybe with some kind of yaml / jinja magic...?
Thanks!
There has been a number of issues and feature requests raised around this on the Ansible GitHub; see this pull request for example. In summary, there isn't a good way to do this yet, hopefully there will be soon.
A common workaround for the time being is to define a list values in one place and a second list extra_values elsewhere then merge them before use.

How to exclude instances of the EC2 inventory in Ansible?

We have an Ansible server using EC2 dynamic inventory:
https://github.com/ansible/ansible/blob/devel/contrib/inventory/ec2.py
https://github.com/ansible/ansible/blob/devel/contrib/inventory/ec2.ini
However, with the number of instances we have, running ./ec2.py --list or ./ec2.py --refresh-cache returns a 28,000 line JSON response.
This I assume, causes it to randomly fail (returns a Python stack trace) as it only receives a partial response when sending a call to AWS, but is then fine if ran again.
Which is why I want to know if there's a way to cut this down.
I know there is a way to include specific instances by tag in the ec2.ini (i.e. # instance_filters = tag:env=staging), but with
the way our instances are tagged, is there a way to exclude
instances instead (something that would look similar to: # instance_filters = tag:name=!dev)?
is there a way to exclude instances instead
Just for completeness, I wanted to point out that the "inventory protocol" for ansible is super straightforward to implement, and they even have a JSON Schema for it.
You can see an example of the output it is expecting by running the newly included ansible-inventory script with --list to see the output it generates from one of the .ini style inventories, and then use that to emit your own:
$ printf 'somehost ansible_user=bob\n\n[some_group]\nsomehost\n' > sample
$ ansible-inventory -i ./sample --list
What I am suggesting is that you might have better luck making a custom inventory script, that does know your local business practices, rather than trying to force ec2.py into running a negation query (which, as best I can tell, it will not do).
To generate dynamic inventory, just make an executable -- as far as I know it can be in any language at all -- and then point the -i at the executable script instead of a "normal" file. Ansible will invoke that program, and operate on the JSON output as the inventory. There are several examples people have posted as gists, in all kinds of languages.
I would still love it if you would file an issue with ansible about ec2.py, because you have the situation that can make the bug report concrete for them in ways that a simple "it doesn't work for large inventories" doesn't capture. But in the mean time, writing your own inventory provider is actually less work than it sounds.
I use the option pattern_exclude in ec2.ini:
# If you want to exclude any hosts that match a certain regular expression
pattern_exclude = staging-*
and
hostname_variable = tag_Name

Which module to use to edit files - Ansible

I want to edit the configuration file of telegraf(system metrics collecting agent).
Telegraf comes in with a default config file which can be edited. There are many input and output plugins defined in there, which are commented out and can be added by removing the comments and also be customized.
I want to edit only some of the plugins defined there, not all of them. For example, consider this is the file,
[global]
interval='10s'
[outputs.influxdb]
host=['http://localhost:8086']
#[outputs.elasticsearch]
# host=['http://localhost:9200']
[inputs.netstat]
interface='eth0'
Now, I want to edit the 3 blocks, global, outputs.influxdb and inputs.netstat. I don't want to edit outputs.elasticsearch but also want that the block outputs.elasticsearch should remain in the file.
When Using Ansible, I firstly used Template module, but if I use that, then the commented data would be lost.
Then I used the ini_file module, instead of editing the already present block, it adds a new block even if it is already present, and results in something like this,
[outputs.influxdb]
host=[http://localhost:8086]
[outputs.influxdb]
host=[http://xx.xx.xx.xx:8086]
Which module is ideal for my scenario ?
There are several options, depending on your purpose.
The lineinfile - module is the best option, if you just want to add, replace or remove one line.
The replace - module is best, if you want to add, replace or delete several lines.
The blockinfile - module can add several lines, surrounded by markers.
If you only want to change two or three lines, you could use as many calls of lineinfile. To change a whole config file, I would recommend, like the commenters suggest, use the template - module.
Ok, if you really really want to avoid using templates, you could try to use replace and a regex like this:
- hosts: local
tasks:
- replace:
path: testfile
regexp: '^\[{{ item.category }}\]\s(.*)host(.*)$'
replace: '[{{ item.category }}]\n host=[{{ item.host }}]'
with_items:
- { category: 'outputs.influxdb', host: 'http://cake.com:8080' }
This, in its current form, would not necessarily handle more than one option under each category, but the regex can be modified to handle multiple lines.
As required, it will not touch the # commented lines. However, if you decide to enable some of the previously inactive sections, you might end up with a slightly messier configuration file that would include the instructions both commented and uncommented (shouldn't impact functionality, only 'looks'). You will also need to account for options that look like the example below (interleaved commented/uncommented values) and create regexes specially for those use-cases:
[section]
option1=['value']
# option2=['value']
option3=['value']
It highly depends on your use-case, but my recommendation remains that templates are to be used instead, as they are a more robust approach, with less chances of things going wrong.

Resources