Note: Please see the #### UPDATE ### section below. I've heavily modified the question for clarity on what I'm trying to achieve, but added it as an addendum rather than rewrite the question.
As my infrastructure grows, adding input variables in my variables.tf files and then syncing those values to output variables in my outputs.tf file is now impossible to do manually. Not only is it taking up a lot of unnecessary time, probably more time is spent going back and fixing the ones that terraform validate told me that I missed by human error. This is especially true when building / using modules whose arguments add an additional layer to manage.
There has to be a better way? Here is what I want to achieve.
Let's say I'm creating an Azure AKS Kubernetes cluster. The Terraform resource is azurerm_kubernetes_cluster.
Only 8 arguments are required to create a base install, but there are almost 250 additional ones. They all have default values. Per the documentation page, they also already have fantastic descriptions. (I'm tired of copying and pasting into my variables { description = "this"} block.)
The information is there in the documentation. terraform plan also has knowledge of every single additional one because it of course comes up in the pre-apply plan. (known after apply) means its optional, but will have a default value.
In my dream world, I'd run this hypothetical command sequence:
terraform plan
terraform document <- Here it auto generates every argument as a variables block and inserts it into variables.tf. It also auto generates every possible output "out_putable" {} block and inserts it into outputs.tf.
terraform apply -update-inputs -update-outputs <- Here everything that was optional (known after apply) is now known and it should auto update variables.tf and outputs.tf accordingly. Adding a -update-modules flag lets it take care of that additional layer introduced by using modules.
This feels like a problem that has been addressed before. Before I write a custom tool that parses Terraform web docs and the output of terraform show, is there already a way to do this? Terraform-docs is the closest I've come to finding a solution for README.md. If it can do what I need, I haven't figure it out yet.
How can I automate all this?
############
UPDATE
############
This article and video is spot-on when it comes to Terraform's evolution in an organization. My organization is somewhere between late-stage pattern 3 and early 5. As we decompose our "Terralith" we have inconsistencies among teams (patterns, naming conventions, variable and argument choices etc). These are starting to cause errors in CI/CD forcing a ticket-review process that is slowing things down.
All resources have required and optional arguments. But in my organization, we have, for example, additional optional arguments that are required for us.
Scenario: Dev A in Japan creates a resource, forgets an optional variable or two or names them something obscure, etc. Dev B in America is blocked until they can convene and discuss. Given time zones, language differences, ticket review, this one issue is now a week or more delayed.
I need to automate this and create exact consistency so that Dev A starts out with exactly what Dev B would start with or is expecting; and, what CI/CD tests are expecting - templating the initial process, if you will. In other words, I need to remove the human element of manually creating main.tf, variables.tf, outputs.tf, etc.
Here are thoughts on how to achieve this:
Use Golang to autogenerate the files by querying the API
How can I query the API to get a list of all required arguments for a specific resource?
I found that I can query for provider information, but I can't find info to retrieve resource information. My thinking is when a developer wants to create a new resource, He'll run a go or typescript to generate the manifest files along with expected naming conventions, and populate main.tf, variables.tf, outputs.tf, etc, with exactly what data that everyone is expecting. I'm looking form something like curl registry.terraform.io/providers/hashicorp/azurerm/v2.99/resource_group?required=yes This should show me all required arguments along with descriptions and other info I can use straight from the API.
Use CDKTF to generate an HCL manifest.tf file from JSON
How can I use CDKTF to generate an HCL .tf file?
CDKTF is EXACTLY what I'm looking for - except in reverse. HCL is seamlessly compatible with JSON. Running cdktf synth creates ./out/cdk.tf.out I'm so close! How do I turn that file into main.tf?!?
The goal here is to have a master file from which all future manifest files are derived. Whether we use azurerm_kubernetes_cluster 1 time or 1000 times, I know for certain that every argument, every variable name, every desired output is exactly the same. If a chance is needed in our desired structure, it will be updated at the JSON level, and CI/CD can ensure those changes are propagated across instances of its use.
I know that I can use the cdk.out.tf file as a drop in replacement for a module, but I don't want my team members to have to learn typescript or how to read json. If I can create a templatized JSON file containing exactly what I'm expecting users to start with, and if they can run some command like cdktf convert cdk.tf.out --HCL output-file.tf then I've accomplished my goal.
If cdktf synth can create an HCL JSON file, and cdktf convert can take a manifest.tf file and turn it into HCL JSON, can't it do the exact opposite? Turn the HCL JSON file into the human-readable, declarative, manifest.tf file?
Perhaps think of it this way. Terraform has a required file structure for a module if it's to be allowed into the module registry. I'm trying to create a similar required structure for each of the resources our organization uses regardless of when and where it's used.
If your goal is to derive input variables and output values from resource type schemas then Terraform can provide you with the information to do so.
In the working directory of a configuration that already uses the provider whose resource type you want to use, run the following command:
terraform providers schema -json
The result contains a JSON description of all of the resource types available in the providers for the current configuration, and for each one the metadata about its attributes, including the type constraint information and descriptions for each one.
From that you can generate whatever other files you need based on that information.
Note that if you are intending to build modules which export the entire surface area (all inputs and all outputs) of a particular resource type the Terraform documentation explicitly recommends against this, suggesting to just use the resource type directly instead since such a module would often not offer sufficient benefit to outweigh the additional complexity and maintenance overhead it implies:
In principle any combination of resources and other constructs can be factored out into a module, but over-using modules can make your overall Terraform configuration harder to understand and maintain, so we recommend moderation.
A good module should raise the level of abstraction by describing a new concept in your architecture that is constructed from resource types offered by providers.
For example, aws_instance and aws_elb are both resource types belonging to the AWS provider. You might use a module to represent the higher-level concept "HashiCorp Consul cluster running in AWS" which happens to be constructed from these and other AWS provider resources.
We do not recommend writing modules that are just thin wrappers around single other resource types. If you have trouble finding a name for your module that isn't the same as the main resource type inside it, that may be a sign that your module is not creating any new abstraction and so the module is adding unnecessary complexity. Just use the resource type directly in the calling module instead.
I've got the same question and develop a small bash script to create output definitions based on module code
This code required the hcledit tool to extract blocks from hcl code
#!/usr/bin/env bash
set -o pipefail
_hcledit=$(which hcledit)
for tf_file in $(ls *.tf); do
cat $tf_file | $_hcledit block list | while read line; do
block_type="${line%%.*}"
line="${line#*.}"
case $block_type in
locals|output|variable|data) continue; break ;;
module)
output_name=$line
output_description="Module '$output_name' attributes"
output_value="$block_type.$output_name"
;;
resource)
label_kind="${line%.*}"
label_name="${line#*.}"
output_name="${label_kind}_${label_name//[\-]/_}"
output_description="Resource '$label_kind.$label_name' attributes"
output_value="$label_kind.$label_name"
;;
esac
cat <<-EOT
output "$output_name" {
description = "$output_description"
value = $output_value
}
EOT
done
done
I have an anchor as follows:
helm-install
docker-flags: &my_docker_flags
- "--network host"
- "--env KUBECONFIG=/tmp/admin.conf"
- "--env HOME=${env.HOME}"
- "--volume ${env.KUBECONFIG}:/tmp/admin.conf:ro"
- "--volume ${env.PWD}:${env.PWD}"
- "--volume ${env.HOME}/.helm:${env.HOME}/.helm"
- "--volume
${var.docker_config_basepath}:${var.docker_config_basepath}"
later I want to do:
docker-flags:
<<: *my_docker_flags
- "--env K8_NAMESPACE=${env.K8_NAMESPACE}"
But, the last line is flagged as bad indentation of a mapping entry YAML
The YAML merge key <<, defined here, is a feature defined for outdated YAML 1.1. It has never been part of the spec and thus its implementation is optional. Lots of YAML implementations implemented it and it remains a feature even while they get updated for YAML 1.2, which doesn't define this feature.
As a „key“, it is not a special syntax feature. Instead, much like the scalar true, it gets interpreted as something special because of its content. Supporting implementations will treat it according to the linked specification when it occurs as key in a mapping.
However, a sequence like the one you are showing is a different data structure: It contains a sequence of items. There is no place to put a merge key here, so you cannot use this feature in a sequence.
Generally, YAML is not a data processing language. << was and is an exception to that, there are no other processing features – neither for merging sequences, nor for different operations you would expect from a data processing language, like e.g. concatenation of strings.
For this reason, lots of tools that heavily use YAML, such as Ansible or Helm, include some kind of template processing for their YAML input files. While far from perfect, templating is currently the most versatile way to do data processing in a YAML file.
If the tool that reads your YAML doesn't provide you with a templating engine, your only option is to pre-process the YAML file manually, for example using a simple templating engine like mustache. Whether that is feasible depends of course on the context.
I want to edit the configuration file of telegraf(system metrics collecting agent).
Telegraf comes in with a default config file which can be edited. There are many input and output plugins defined in there, which are commented out and can be added by removing the comments and also be customized.
I want to edit only some of the plugins defined there, not all of them. For example, consider this is the file,
[global]
interval='10s'
[outputs.influxdb]
host=['http://localhost:8086']
#[outputs.elasticsearch]
# host=['http://localhost:9200']
[inputs.netstat]
interface='eth0'
Now, I want to edit the 3 blocks, global, outputs.influxdb and inputs.netstat. I don't want to edit outputs.elasticsearch but also want that the block outputs.elasticsearch should remain in the file.
When Using Ansible, I firstly used Template module, but if I use that, then the commented data would be lost.
Then I used the ini_file module, instead of editing the already present block, it adds a new block even if it is already present, and results in something like this,
[outputs.influxdb]
host=[http://localhost:8086]
[outputs.influxdb]
host=[http://xx.xx.xx.xx:8086]
Which module is ideal for my scenario ?
There are several options, depending on your purpose.
The lineinfile - module is the best option, if you just want to add, replace or remove one line.
The replace - module is best, if you want to add, replace or delete several lines.
The blockinfile - module can add several lines, surrounded by markers.
If you only want to change two or three lines, you could use as many calls of lineinfile. To change a whole config file, I would recommend, like the commenters suggest, use the template - module.
Ok, if you really really want to avoid using templates, you could try to use replace and a regex like this:
- hosts: local
tasks:
- replace:
path: testfile
regexp: '^\[{{ item.category }}\]\s(.*)host(.*)$'
replace: '[{{ item.category }}]\n host=[{{ item.host }}]'
with_items:
- { category: 'outputs.influxdb', host: 'http://cake.com:8080' }
This, in its current form, would not necessarily handle more than one option under each category, but the regex can be modified to handle multiple lines.
As required, it will not touch the # commented lines. However, if you decide to enable some of the previously inactive sections, you might end up with a slightly messier configuration file that would include the instructions both commented and uncommented (shouldn't impact functionality, only 'looks'). You will also need to account for options that look like the example below (interleaved commented/uncommented values) and create regexes specially for those use-cases:
[section]
option1=['value']
# option2=['value']
option3=['value']
It highly depends on your use-case, but my recommendation remains that templates are to be used instead, as they are a more robust approach, with less chances of things going wrong.
What are the differences between YAML and JSON, specifically considering the following things?
Performance (encode/decode time)
Memory consumption
Expression clarity
Library availability, ease of use (I prefer C)
I was planning to use one of these two in our embedded system to store configure files.
Related:
Should I use YAML or JSON to store my Perl data?
Technically YAML is a superset of JSON. This means that, in theory at least, a YAML parser can understand JSON, but not necessarily the other way around.
See the official specs, in the section entitled "YAML: Relation to JSON".
In general, there are certain things I like about YAML that are not available in JSON.
As #jdupont pointed out, YAML is visually easier to look at. In fact the YAML homepage is itself valid YAML, yet it is easy for a human to read.
YAML has the ability to reference other items within a YAML file using "anchors." Thus it can handle relational information as one might find in a MySQL database.
YAML is more robust about embedding other serialization formats such as JSON or XML within a YAML file.
In practice neither of these last two points will likely matter for things that you or I do, but in the long term, I think YAML will be a more robust and viable data serialization format.
Right now, AJAX and other web technologies tend to use JSON. YAML is currently being used more for offline data processes. For example, it is included by default in the C-based OpenCV computer vision package, whereas JSON is not.
You will find C libraries for both JSON and YAML. YAML's libraries tend to be newer, but I have had no trouble with them in the past. See for example Yaml-cpp.
Differences:
YAML, depending on how you use it, can be more readable than JSON
JSON is often faster and is probably still interoperable with more systems
It's possible to write a "good enough" JSON parser very quickly
Duplicate keys, which are potentially valid JSON, are definitely invalid YAML.
YAML has a ton of features, including comments and relational anchors. YAML syntax is accordingly quite complex, and can be hard to understand.
It is possible to write recursive structures in yaml: {a: &b [*b]}, which will loop infinitely in some converters. Even with circular detection, a "yaml bomb" is still possible (see xml bomb).
Because there are no references, it is impossible to serialize complex structures with object references in JSON. YAML serialization can therefore be more efficient.
In some coding environments, the use of YAML can allow an attacker to execute arbitrary code.
Observations:
Python programmers are generally big fans of YAML, because of the use of indentation, rather than bracketed syntax, to indicate levels.
Many programmers consider the attachment of "meaning" to indentation a poor choice.
If the data format will be leaving an application's environment, parsed within a UI, or sent in a messaging layer, JSON might be a better choice.
YAML can be used, directly, for complex tasks like grammar definitions, and is often a better choice than inventing a new language.
Bypassing esoteric theory
This answers the title, not the details as most just read the title from a search result on google like me so I felt it was necessary to explain from a web developer perspective.
YAML uses space indentation, which is familiar territory for Python developers.
JavaScript developers love JSON because it is a subset of JavaScript and can be directly interpreted and written inside JavaScript, along with using a shorthand way to declare JSON, requiring no double quotes in keys when using typical variable names without spaces.
There are a plethora of parsers that work very well in all languages for both YAML and JSON.
YAML's space format can be much easier to look at in many cases because the formatting requires a more human-readable approach.
YAML's form while being more compact and easier to look at can be deceptively difficult to hand edit if you don't have space formatting visible in your editor. Tabs are not spaces so that further confuses if you don't have an editor to interpret your keystrokes into spaces.
JSON is much faster to serialize and deserialize because of significantly less features than YAML to check for, which enables smaller and lighter code to process JSON.
A common misconception is that YAML needs less punctuation and is more compact than JSON but this is completely false. Whitespace is invisible so it seems like there are less characters, but if you count the actual whitespace which is necessary to be there for YAML to be interpreted properly along with proper indentation, you will find YAML actually requires more characters than JSON. JSON doesn't use whitespace to represent hierarchy or grouping and can be easily flattened with unnecessary whitespace removed for more compact transport.
The Elephant in the room: The Internet itself
JavaScript so clearly dominates the web by a huge margin and JavaScript developers prefer using JSON as the data format overwhelmingly along with popular web APIs so it becomes difficult to argue using YAML over JSON when doing web programming in the general sense as you will likely be outvoted in a team environment. In fact, the majority of web programmers aren't even aware YAML exists, let alone consider using it.
If you are doing any web programming, JSON is the default way to go because no translation step is needed when working with JavaScript so then you must come up with a better argument to use YAML over JSON in that case.
This question is 6 years old, but strangely, none of the answers really addresses all four points (speed, memory, expressiveness, portability).
Speed
Obviously this is implementation-dependent, but because JSON is so widely used, and so easy to implement, it has tended to receive greater native support, and hence speed. Considering that YAML does everything that JSON does, plus a truckload more, it's likely that of any comparable implementations of both, the JSON one will be quicker.
However, given that a YAML file can be slightly smaller than its JSON counterpart (due to fewer " and , characters), it's possible that a highly optimised YAML parser might be quicker in exceptional circumstances.
Memory
Basically the same argument applies. It's hard to see why a YAML parser would ever be more memory efficient than a JSON parser, if they're representing the same data structure.
Expressiveness
As noted by others, Python programmers tend towards preferring YAML, JavaScript programmers towards JSON. I'll make these observations:
It's easy to memorise the entire syntax of JSON, and hence be very confident about understanding the meaning of any JSON file. YAML is not truly understandable by any human. The number of subtleties and edge cases is extreme.
Because few parsers implement the entire spec, it's even harder to be certain about the meaning of a given expression in a given context.
The lack of comments in JSON is, in practice, a real pain.
Portability
It's hard to imagine a modern language without a JSON library. It's also hard to imagine a JSON parser implementing anything less than the full spec. YAML has widespread support, but is less ubiquitous than JSON, and each parser implements a different subset. Hence YAML files are less interoperable than you might think.
Summary
JSON is the winner for performance (if relevant) and interoperability. YAML is better for human-maintained files. HJSON is a decent compromise although with much reduced portability. JSON5 is a more reasonable compromise, with well-defined syntax.
GIT and YAML
The other answers are good. Read those first. But I'll add one other reason to use YAML sometimes: git.
Increasingly, many programming projects use git repositories for distribution and archival. And, while a git repo's history can equally store JSON and YAML files, the "diff" method used for tracking and displaying changes to a file is line-oriented. Since YAML is forced to be line-oriented, any small changes in a YAML file are easier to see by a human.
It is true, of course, that JSON files can be "made pretty" by sorting the strings/keys and adding indentation. But this is not the default and I'm lazy.
Personally, I generally use JSON for system-to-system interaction. I often use YAML for config files, static files, and tracked files. (I also generally avoid adding YAML relational anchors. Life is too short to hunt down loops.)
Also, if speed and space are really a concern, I don't use either. You might want to look at BSON.
I find YAML to be easier on the eyes: less parenthesis, "" etc. Although there is the annoyance of tabs in YAML... but one gets the hang of it.
In terms of performance/resources, I wouldn't expect big differences between the two.
Futhermore, we are talking about configuration files and so I wouldn't expect a high frequency of encode/decode activity, no?
Technically YAML offers a lot more than JSON (YAML v1.2 is a superset of JSON):
comments
anchors and inheritance - example of 3 identical items:
item1: &anchor_name
name: Test
title: Test title
item2: *anchor_name
item3:
<<: *anchor_name
# You may add extra stuff.
...
Most of the time people will not use those extra features and the main difference is that YAML uses indentation whilst JSON uses brackets. This makes YAML more concise and readable (for the trained eye).
Which one to choose?
YAML extra features and concise notation makes it a good choice for configuration files (non-user provided files).
JSON limited features, wide support, and faster parsing makes it a great choice for interoperability and user provided data.
If you don't need any features which YAML has and JSON doesn't, I would prefer JSON because it is very simple and is widely supported (has a lot of libraries in many languages). YAML is more complex and has less support. I don't think the parsing speed or memory use will be very much different, and maybe not a big part of your program's performance.
Benchmark results
Below are the results of a benchmark to compare YAML vs JSON loading times, on Python and Perl
JSON is much faster, at the expense of some readability, and features such as comments
Test method
100 sequential runs on a fast machine, average number of seconds
The dataset was a 3.44MB JSON file, containing movie data scraped from Wikipedia
https://raw.githubusercontent.com/prust/wikipedia-movie-data/master/movies.json
Linked to from: https://github.com/jdorfman/awesome-json-datasets
Results
Python 3.8.3 timeit
JSON: 0.108
YAML CLoader: 3.684
YAML: 29.763
Perl 5.26.2 Benchmark::cmpthese
JSON XS: 0.107
YAML XS: 0.574
YAML Syck: 1.050
Perl 5.26.2 Dumbbench (Brian D Foy, excludes outliers)
JSON XS: 0.102
YAML XS: 0.514
YAML Syck: 1.027
From: Arnaud Lauret Book “The Design of Web APIs.” :
The JSON data format
JSON is a text data format based on how the JavaScript programming language describes data but is, despite its name, completely language-independent (see https://www.json.org/). Using JSON, you can describe objects containing unordered name/value pairs and also arrays or lists containing ordered values, as shown in this figure.
An object is delimited by curly braces ({}). A name is a quoted string ("name") and is sep- arated from its value by a colon (:). A value can be a string like "value", a number like 1.23, a Boolean (true or false), the null value null, an object, or an array. An array is delimited by brackets ([]), and its values are separated by commas (,).
The JSON format is easily parsed using any programming language. It is also relatively easy to read and write. It is widely adopted for many uses such as databases, configura- tion files, and, of course, APIs.
YAML
YAML (YAML Ain’t Markup Language) is a human-friendly, data serialization format. Like JSON, YAML (http://yaml.org) is a key/value data format. The figure shows a comparison of the two.
Note the following points:
There are no double quotes (" ") around property names and values in YAML.
JSON’s structural curly braces ({}) and commas (,) are replaced by newlines and
indentation in YAML.
Array brackets ([]) and commas (,) are replaced by dashes (-) and newlines in
YAML.
Unlike JSON, YAML allows comments beginning with a hash mark (#).
It is relatively easy to convert one of those formats into the other. Be forewarned though, you will lose comments when converting a YAML document to JSON.
Since this question now features prominently when searching for YAML and JSON, it's worth noting one rarely-cited difference between the two: license. JSON purports to have a license which JSON users must adhere to (including the legally-ambiguous "shall be used for Good, not Evil"). YAML carries no such license claim, and that might be an important difference (to your lawyer, if not to you).
Sometimes you don't have to decide for one over the other.
In Go, for example, you can have both at the same time:
type Person struct {
Name string `json:"name" yaml:"name"`
Age int `json:"age" yaml:"age"`
}
I find both YAML and JSON to be very effective. The only two things that really dictate when one is used over the other for me is one, what the language is used most popularly with. For example, if I'm using Java, Javascript, I'll use JSON. For Java, I'll use their own objects, which are pretty much JSON but lacking in some features, and convert it to JSON if I need to or make it in JSON in the first place. I do that because that's a common thing in Java and makes it easier for other Java developers to modify my code. The second thing is whether I'm using it for the program to remember attributes, or if the program is receiving instructions in the form of a config file, in this case I'll use YAML, because it's very easily human read, has nice looking syntax, and is very easy to modify, even if you have no idea how YAML works. Then, the program will read it and convert it to JSON, or whatever is preferred for that language.
In the end, it honestly doesn't matter. Both JSON and YAML are easily read by any experienced programmer.
If you are concerned about better parsing speed then storing the data in JSON is the option. I had to parse the data from a location where the file was subject to modification from other users and hence I used YAML as it provides better readability compared to JSON.
And you can also add comments in the YAML file which can't be done in a JSON file.
JSON encodes six data types: Objects (mappings), Arrays, Strings Numbers, Booleans and Null. It is extremely easy for a machine to parse and provides very little flexibility. The specification is about a page and a half.
YAML allows the encoding of arbitrary Python data and other crazy crap (which leads to vulnerabilities when decoding it). It is hard to parse because it offers so much flexibility. The specification for YAML was 86 pages, the last time I checked. YAML syntax is obviously influenced by Python, but maybe they should have been a little more influenced by the Python philosophy on a few points: e.g. “there should be one—and preferably only one—obvious way to do it” and “simple is better than complex.”
The main benefit of YAML over JSON is that it’s easier for humans to read and edit, which makes it a natural choice for configuration files.
These days, I’m leaning towards TOML for configuration files. It’s not as pretty or as flexible as YAML, but it’s easier both for machines and humans to parse. The syntax is (almost) a superset of INI syntax, but it parses out to JSON-like data structures, adding only one additional type: the date type.
I want to have parse nested configurations in Bash, like below:
[foo]
[bar]
key="value"
[baz]
key="value"
I tried this .ini parser but it does not support nesting. Later I found out that nesting isn't allowed in .ini files.
I searched for a YAML parser for bash, but I couldn't find a lot. Nested configuration parsing in bash seems to me as a basic problem, so I guess a trivial solution exists, but I could not find one. Does a triivial solution for parsing nested configuration in Bash exists? If yes, which one?
EDIT
I want to write a script/program for automated backup and restore of databases. The configuration needs to flexible so that I can select databases on different hosts, with different users and passwords and with different backup intervals. Oh, and I want to learn bash. But I am starting to think that Bash is not the right tool for my problem.
Bash is not the right language for this. There are no nested arrays, and dynamic variable assignment is a bit of a mine field compared to languages like Python and Ruby. That said, it sounds like you're specifying the format and parser yourself, so you could simply use a hierarchical naming scheme for your configuration:
foo_bar_key="value"
foo_baz_key="value"
I wrote a Yamlesque parser in response to this similar question.
It will parse
foo:
bar:
key: value
baz:
key: value
into bash associative arrays. 100% Bash, but it needs to be Bash 4.x.