Parsing value from non-trivial JSON using Ansibles uri module - ansible

I have this (in the example shown I reduced it by removing many lines) non-trivial JSON retrieved from a Spark server:
{
"spark.worker.cleanup.enabled": true,
"spark.worker.ui.retainedDrivers": 50,
"spark.worker.cleanup.appDataTtl": 7200,
"fusion.spark.worker.webui.port": 8082,
"fusion.spark.worker.memory": "4g",
"fusion.spark.worker.port": 8769,
"spark.worker.timeout": 30
}
I try to read fusion.spark.worker.memory but fail to do so. In my debug statements I can see that the information is there:
msg: "Spark memory: {{spark_worker_cfg.json}} shows this:
ok: [process1] => {
"msg": "Spark memory: {u'spark.worker.ui.retainedDrivers': 50, u'spark.worker.cleanup.enabled': True, u'fusion.spark.worker.port': 8769, u'spark.worker.cleanup.appDataTtl': 7200, u'spark.worker.timeout': 30, u'fusion.spark.worker.memory': u'4g', u'fusion.spark.worker.webui.port': 8082}"
}
The dump using var: spark_worker_cfg shows this:
ok: [process1] => {
"spark_worker_cfg": {
"changed": false,
"connection": "close",
"content_length": "279",
"content_type": "application/json",
"cookies": {},
"cookies_string": "",
"failed": false,
"fusion_request_id": "Pj2zeWThLw",
"json": {
"fusion.spark.worker.memory": "4g",
"fusion.spark.worker.port": 8769,
"fusion.spark.worker.webui.port": 8082,
"spark.worker.cleanup.appDataTtl": 7200,
"spark.worker.cleanup.enabled": true,
"spark.worker.timeout": 30,
"spark.worker.ui.retainedDrivers": 50
},
"msg": "OK (279 bytes)",
"redirected": false,
"server": "Jetty(9.4.12.v20180830)",
"status": 200,
"url": "http://localhost:8765/api/v1/configurations?prefix=spark.worker"
}
}
I can't access the value using {{spark_worker_cfg.json.fusion.spark.worker.memory}}, my problem seems to be caused by the names containing dots:
The task includes an option with an undefined variable. The error was:
'dict object' has no attribute 'fusion'
I have had a look at two SO posts (1 and 2) that look like duplicates of my question but could not derive from them how to solve my current issue.

The keys in the 'json' element of the data structure, contain literal dots, rather than represent a structure. This will causes issues, because Ansible will not know to treat them as literal if dotted notation is used. Therefore, use square bracket notation to reference them, rather than dotted:
- debug:
msg: "{{ spark_worker_cfg['json']['fusion.spark.worker.memory'] }}"
(At first glance this looked like an issue with a JSON encoded string that needed decoding, which could have been handled:"{{ spark_worker_cfg.json | from_json }}")

You could use the json_query filter to get your results. https://docs.ansible.com/ansible/latest/user_guide/playbooks_filters.html
msg="{{ spark_worker_cfg.json | json_query('fusion.spark.worker.memory') }}
edit:
In response to your comment, the fact that we get an empty string returned leads me to believe that the query isn't correct. It can be frustrating to find the exact query while using the json_query filter so I usually use a jsonpath tool beforehand. I've linking one in my comment below but I, personally, use the jsonUtils addon in intelliJ to find my path (which still needs adjustment because the paths are handled a bit differently between the two).
If your json looked like this:
{
value: "theValue"
}
then
json_query('value')
would work.
The path you're passing to json_query isn't correct for what you're trying to do.
If your top level object was named fusion_spark_worker_memory (without the periods), then your query should work. The dots are throwing things off, I believe. There may be a way to escape those in the query...
edit 2: clockworknet for the win! He beat me to it both times. :bow:

Related

Using parse_xml in an Ansible playbook

I've been trying to parse XML data in Ansible. I can get it to work using the xml module but I think that using parse_xml would better suit my needs.
I don't seem to be able to match any of the data in the xml with my specs file.
Here is the xml data:
<data xmlns=\"urn:ietf:params:xml:ns:netconf:base:1.0\" xmlns:nc=\"urn:ietf:params:xml:ns:netconf:base:1.0\">
<ntp xmlns=\"http://cisco.com/ns/yang/Cisco-IOS-XR-ip-ntp-oper\">
<nodes>
<node>
<node>0/0/CPU0</node>
<associations>
<is-ntp-enabled>true</is-ntp-enabled>
<sys-leap>ntp-leap-no-warning</sys-leap>
<peer-summary-info>
<peer-info-common>
<host-mode>ntp-mode-client</host-mode>
<is-configured>true</is-configured>
<address>10.1.1.1</address>
<reachability>0</reachability>
</peer-info-common>
<time-since>-1</time-since>
</peer-summary-info>
<peer-summary-info>
<peer-info-common>
<host-mode>ntp-mode-client</host-mode>
<is-configured>true</is-configured>
<address>172.16.252.29</address>
<reachability>255</reachability>
</peer-info-common>
<time-since>991</time-since>
</peer-summary-info>
</associations>
</node>
</nodes>
</ntp>
</data>
This is what the spec file looks like:
---
vars:
ntp_peers:
address: "{{ item.address }}"
reachability: "{{ item.reachability}}"
keys:
result:
value: "{{ ntp_peers }}"
top: data/ntp/nodes/node/associations
items:
address: peer-summary-info/peer-info-common/address
reachability: peer-summary-info/peer-info-common/reachability
and the task in the yaml file:
- name: parse ntp reply
set_fact:
parsed_ntp_data: "{{ NTP_STATUS.stdout | parse_xml('specs/iosxr_ntp.yaml') }}"
but the data does not return any results:
TASK [debug parsed_ntp_data] **************************************************************************************************************************************************************************
ok: [core-rtr01] => {
"parsed_ntp_data": {
"result": []
}
}
ok: [dist-rtr01] => {
"parsed_ntp_data": {
"result": []
}
}
I had never seen parse_xml before, so that was a fun adventure
There appear to be two things conspiring against you: the top: key is evaluated from the root Element, and your XML (unlike the rest of the examples) uses XML namespaces (the xmlns= bit) which means your XPaths have to be encoded in the Element.findall manner
For the first part, since Element.findall is run while sitting on the <data> Element, that means one cannot reference data/... in an XPath because that would be applicable to a structure <data><data>. I tried being sneaky by just making the XPath absolute /data/... but Python's XPath library throws up in that circumstance. So, at the very least your top: key needs to not start with data anything
Then, the xmlns= in your snippet stood out to me because that means those element's names are actually NS+":"+localName for every element, and thus an XPath of ntp does NOT match ns0:ntp because they're considered completely separate names (that being the point of the namespace, after all). It may very well be possible to use enough //*[localname() = "ntp"] silliness to avoid having to specify the namespace over and over, but I didn't try it
Again, as a concession to Python's XPath library, they encode the fully qualified name in an xpath as {the-namespace}local-name and there does not seem to be any way short of modifying network.py to pass in namespaces :-(
Thus, the "hello world" version that I used to confirm my theory:
vars:
ntp_peers:
address: "{{ item.address }}"
keys:
result:
value: "{{ ntp_peers }}"
top: '{http://cisco.com/ns/yang/Cisco-IOS-XR-ip-ntp-oper}ntp/{http://cisco.com/ns/yang/Cisco-IOS-XR-ip-ntp-oper}nodes/{http://cisco.com/ns/yang/Cisco-IOS-XR-ip-ntp-oper}node'
items:
address: '{http://cisco.com/ns/yang/Cisco-IOS-XR-ip-ntp-oper}node'
cheerfully produced
ok: [localhost] => {
"msg": {
"result": [
{
"address": "0/0/CPU0"
}
]
}
}

Use list of dictionaries variable on Ansible Tower textare survey

I'm trying to develop a playbook were I have the following variable.
disk_vars:
- { Unit: C, Size: 50 }
- { Unit: D, Size: 50 }
With the variables defined on the playbook there is no problem but when I try to use a texarea survey on Ansible Tower I cannot manage to parse them as list of dictionaries.
I tried adding to the survey the following two lines which are already on yaml format.
- { Unit: C, Size: 50 }
- { Unit: D, Size: 50 }
And on my vars section I use test_var: "{{ test_var1.split('\n') }} which converts the output into a two line string. Without the split is just a single line string.
I could make my playbook work with a simple dictionary like
dict1: {{ Unit: C, Size: 50 }}
but I'm having issues parsing it as well.
EDIT
Changing it to the following as suggested by mdaniels works.
- set_fact:
test_var: "{{ test_var1 | from_yaml }}"
- name: test
debug: msg=" hostname is {{ item.Unit }} and {{ item.Size }}"
with_items:
- "{{ test_var }}"
I'm trying to figure a way to clear-up the data input as asking users to respect the format is not a very good idea.
tried changing the input date to the following but I could not figure out how to format that into a list of dictionaries.
disk_vars:
Unit: C, Size: 50
Unit: D, Size: 50
I tried with the following piece of code
- set_fact:
db_list: >-
{{ test_var1.split("\n") | select |
map("regex_replace", "^", "- {") |
map("regex_replace", "$", "}") |
join("\n") }}
But is putting it all on a single line.
"db_list": "- {dbid: 1, dbname: abc\ndbid: 2, dbname: xyz} "
I have tried to play with it but could not manage to make it work.
I believe you were very close; instead of "{{ test_var1.split('\n') }}" I believe you can just feed it to the from_yaml filter:
- set_fact:
test_var1: '{{ test_var1 | from_yaml }}'
# this is just to simulate the **str** that you will receive from the textarea
vars:
test_var1: "- { Unit: C, Size: 50 }\n- { Unit: D, Size: 50 }\n"
- debug:
msg: and now test_var1[0].Unit is {{ test_var1[0].Unit }}
I faced a similar dilemma, i.e. that I was bound to the survey format(s) available, and I was forced to use mdaniels suggested solution above with sending the data as text and then later parse it from YAML . Problem was however that controlling the format of the input (i.e. a YAML-string inside the text) would probably cause a lot of headache/errors, just like you describe.
Maybe you really need to use the Survey, but in my case I was more interested of calling the Job Template using the Tower REST API. For some reason I thought I then had to have a survey with all parameters defined. But it turned out I was wrong, when having a survey I was not able to provide dictionaries as input data (in the extra_vars). However, when removing the Survey, and also (not sure if required or not) enabling "Extra Variables -> prompt on launch", then things started to work!! Now I can provide lists / dictionaries as input to my Templates when calling them using REST API POST calls, see example below:
{
"extra_vars": {
"p_db_name": "MYSUPERDB",
"p_appl_id": "MYD32",
"p_admin_user": "myadmin",
"p_admin_pass": "mysuperpwd",
"p_db_state": "present",
"p_tablespaces": [
{
"name": "tomahawk",
"size": "10M",
"bigfile": true,
"autoextend": true,
"next": "1M",
"maxsize": "20M",
"content": "permanent",
"state": "present"
}
],
"p_users": [
{
"schema": "myschema",
"password": "Mypass123456#",
"default_tablespace": "tomahawk",
"state": "present",
"grants": "'create session', 'create any table'"
}
]
}
}

Listing more than 100 records in Route 53 using Ansible

I'm currently using the route53_facts module on a project. I have 250 record sets in one hosted zone. I'm having difficulty with listing all record sets in that zone. The Route 53 API works by returning pages of maximum 100 records at a time. In order to retrieve the next page, you must pass the NextRecordName response value to the route53_facts module's start_record_name: field (pretty straightforward).
The issue I'm having specifically is getting Ansible to do this. Presumably one would do this using a loop, e.g. in pseudocode:
start
get 100 records
do until response does not contain NextRecordName:
get 100 records (start_record_name=NextRecordName)
end
In Ansible, I have written the below task to do this:
- block:
- name: List record sets in a given hosted zone
route53_facts:
query: record_sets
hosted_zone_id: "/hostedzone/ZZZ1111112222"
max_items: 100
start_record_name: "{{ record_sets.NextRecordName | default(omit) }}"
register: record_sets
until: record_sets.NextRecordName is not defined
when: "'{{ hosted_zone['Name'] }}' == 'test.example.com.'"
...however, this does not work as expected. Instead of continuously paging through responses until no more records are left, it repeatedly returns the first 100 records ("the first page").
As I can see from the Ansible debug output, start_record_name: is repeatedly null:
"attempts": 2,
"changed": false,
"invocation": {
"module_args": {
"aws_access_key": null,
"aws_secret_key": null,
"change_id": null,
"delegation_set_id": null,
"dns_name": null,
"ec2_url": null,
"health_check_id": null,
"health_check_method": "list",
"hosted_zone_id": "/hostedzone/ZZZ1111112222",
"hosted_zone_method": "list",
"max_items": "100",
"next_marker": null,
"profile": null,
"query": "record_sets",
"region": null,
"resource_id": null,
"security_token": null,
"start_record_name": null,
"type": null,
"validate_certs": true
}
},
...my guess is that the | default(omit) filter is always being executed. In other words, record_sets.NextRecordName is never initialized at this point in the task.
I'm hoping somebody can assist me in getting Ansible to return all records from a zone in Route 53. I think I've gotten tangled up in Ansible's looping behavior. Thanks!
Caveat this with "as best I can tell:"
To answer your question, it actually seems that until: and register: do not interact in the same way that when: and register: do. The best explanation I have is that until: behaves like a database transaction: it rolls back the register: assignment if the conditional is false, meaning that when the body of the until: task is tried again, it uses the same parameters as the first time. The only thing which keeps an until: block from being an infinite loop is the retries: value.
So, in your specific case, I think this will do the job:
- name: initial record_set
route53_facts:
# bootstrap so the upcoming "when:" will evaluate correctly
register: record_facts
- set_fact:
# capture the initial answer
records0: '{{ record_facts.ResourceRecordSets }}'
- name: rest of them
route53_facts:
start_record_name: '{{ record_facts.NextRecordName }}'
register: record_facts
when: record_facts.NextRecordName | default("")
with_sequence: count=10
- set_fact:
all_records: >-
{{ record0 + (record_facts.results |
selectattr("ResourceRecordSets", "defined") |
map(attribute="ResourceRecordSets") | list) }}
The with_sequence: is a hack because loop: (for which with_* is syntatic sugar) needs a list of items over which to iterate, but given that the responses that come back without NextRecordName will cause the when: to fail, skipping them, makes the (in your case) 3 through 10 items resolve almost immediately.
Then you just need to pull out the actual response data from the now list of route53_facts: replies, and glue them to the initial one to get the complete list.
Having said all of that, I am now convinced that route53_facts: (and any other AWS module that pushes the burden of that iteration into the playbook) behavior is a bug. The module caller already has a max_items: available to them, but it's an implementation detail that that any value can't be larger than some random pagination cut-off.

JMESPathTypeError when using json_query filter in Ansible with starts_with

I am trying to filter results that arrived from boto3 in Ansible.
When I use json query on the results without the "[?starts_with(...)]" it works well, but when adding the starts_with syntax:
"state_machines[?starts_with(name,'hello')].state_machine_arn"
In order to filter results:
{u'boto3': u'1.4.4', u'state_machines':
[{u'state_machine_arn': u'<state machine arn 1>', u'name': u'hello_world_sfn', u'creation_date': u'2017-05-16 14:26:39.088000+00:00'},
{u'state_machine_arn': u'<state machine arn 2>', u'name': u'my_private_sfn', u'creation_date': u'2017-06-08 07:25:49.931000+00:00'},
{u'state_machine_arn': u'<state machine arn 3>', u'name': u'alex_sfn', u'creation_date': u'2017-06-14 08:35:07.123000+00:00'}],
u'changed': True}" }
I expect to get the first state_machine_arn value: "state machine arn 1"
But instead, I get the exception:
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: JMESPathTypeError: In function contains(), invalid type for value: <lamdba_name>, expected one of: ['array', 'string'], received: "unknown" fatal: [localhost]: FAILED!
=> {"failed": true, "msg": "Unexpected failure during module execution.", "stdout": ""}
What can be the problem?
The problem is that json_query filter expects to get a dictionary with ascii strings, but what you're providing it are unicode strings (notice the u'blabla' in your input).
This is an issue with json_query that apparently got introduced in Ansible 2.2.1 (although that is not really clear), here are some more details:
https://github.com/ansible/ansible/issues/20379#issuecomment-284034650
I hope this gets fixed in a future version, but for now this is a workaround that worked for us:
"{{ results | to_json | from_json | json_query(jmespath_query) }}"
Where jmespath_query is a variable that contains a starts_with query.
This trick of going to and from json turns the unicode strings into ASCII ones :)
As an alternative to writing | to_json | from_json | json_query(…) everywhere, you can monkey-patch Ansible's json_query filter by creating the following filter_plugins/json_bug_workaround.py file:
import json
from ansible.parsing.ajson import AnsibleJSONEncoder
from ansible.plugins.filter.json_query import json_query
class FilterModule(object):
def filters(self):
return {
# Workaround for Unicode bug https://stackoverflow.com/a/44547305
'json_query': lambda data, query: json_query(
json.loads(json.dumps(data, cls=AnsibleJSONEncoder)),
query
),
}
Then you can just use | json_query(…) naturally. This shim does the equivalent of calling | to_json | from_json for you.
You can put it inside your role (roles/role_name/filter_plugins/json_bug_workaround.py) or anywhere in Ansible's plugin search path.

Displaying a custom name for a host

I have an Ansible play-book for working with EC2 instances. I'm using dynamic inventory (ec2.py) to get the group of instances that I want to work with (hosts: tag_Service_Foo). When I run it, it produces output like:
GATHERING FACTS ***************************************************************
ok: [54.149.9.198]
ok: [52.11.22.29]
ok: [52.11.0.3]
However, I can fetch the "Name" tag for a particular instance from Amazon (I do this and store it in a variable for use in a couple parts of the playbook).
Is there a way to get Ansible to use this string for the hostname when displaying progress? I'd like to see something more descriptive (since I don't have the IPs memorized):
GATHERING FACTS ***************************************************************
ok: [main-server]
ok: [extra-server]
ok: [my-cool-server]
The output of the ec2.py inventory script looks like this (truncated; it's very long).
{
"_meta": {
"hostvars": {
"54.149.9.198": {
"ec2__in_monitoring_element": false,
"ec2_ami_launch_index": "0",
"ec2_architecture": "x86_64",
"ec2_client_token": "xxx",
"ec2_dns_name": "xxx",
"ec2_ebs_optimized": false,
"ec2_eventsSet": "",
"ec2_group_name": "",
"ec2_hypervisor": "xen",
"ec2_id": "i-xxx",
"ec2_image_id": "ami-xxx",
"ec2_instance_type": "xxx",
"ec2_ip_address": "xxx",
"ec2_item": "",
"ec2_kernel": "",
"ec2_key_name": "xxx",
"ec2_launch_time": "xxx",
"ec2_monitored": xxx,
"ec2_monitoring": "",
"ec2_monitoring_state": "xxx",
"ec2_persistent": false,
"ec2_placement": "xxx",
"ec2_platform": "",
"ec2_previous_state": "",
"ec2_previous_state_code": 0,
"ec2_private_dns_name": "xxx",
"ec2_private_ip_address": "xxx",
"ec2_public_dns_name": "xxx",
"ec2_ramdisk": "",
"ec2_reason": "",
"ec2_region": "xxx",
"ec2_requester_id": "",
"ec2_root_device_name": "/dev/xvda",
"ec2_root_device_type": "ebs",
"ec2_security_group_ids": "xxx",
"ec2_security_group_names": "xxx",
"ec2_sourceDestCheck": "true",
"ec2_spot_instance_request_id": "",
"ec2_state": "running",
"ec2_state_code": 16,
"ec2_state_reason": "",
"ec2_subnet_id": "subnet-xxx",
"ec2_tag_Name": "main-server",
"ec2_tag_aws_autoscaling_groupName": "xxx",
"ec2_virtualization_type": "hvm",
"ec2_vpc_id": "vpc-xxx"
}
}
}
"tag_Service_Foo": [
"54.149.9.198",
"52.11.22.29",
"52.11.0.3"
],
}
What you need to do is create your own wrapper (say my_ec2.py) over the ec2.py that would post process the output. Idea is to use the behavioral hostvar ansible_ssh_host. You can use any language not only python. As long as it prints valid json on stdout you're good to go. Reference if needed.
It'll be a tiny bit of work. But hope the sudo code would help:
output_json_map = new map
for each group in <ec2_output>: # e.g. tag_Service_Foo, I think there would be another
# key in the output that contains list of group names.
for each ip_address in group:
hname = ec2_output._meta.hostvars.find(ip_address).find(ec2_tag_Name)
# Add new host to the group member list
output_json_map.add(key=group, value=hname)
copy all vars from ec2_output._meta.hostvars.<ip_address>
to output_json_map._meta.hostvars.<hname>
# Assign the IP address of this host to the ansible_ssh_host
# in hostvars for this host
output_json_map.add(key=_meta.hostvars.<hname>.ansible_ssh_host,
value=ip_address)
output_json_map.add(key=_meta.hostvars.find(ip_address).ansible_ssh_host,
value=ip_address)
print output_json_map to stdout
E.g. for your example the output of my_ec2.py should be:
{
"_meta": {
"hostvars": {
"main-server": {
"ansible_ssh_host": "54.149.9.198"
--- snip ---
"ec2_tag_Name": "main-server",
--- snip ---
},
"extra-server": {
"ansible_ssh_host": "52.11.22.29"
--- snip ---
"ec2_tag_Name": "extra-server",
--- snip ---
},
<other hosts from all groups>
}
}
"tag_Service_Foo": [
"main-server",
"extra-server",
<other hosts in this group>
],
"some other group": [
<hosts in this group>,
...
],
}
and obviously, use this my_ec2.py instead of ec2.py as the inventory file. :-)
-- edit --
1) In the groups, can I only refer to things by one name? 2) There's
no notion of an alias? 3) I'm wondering if I could use the IP addr in
the groups and just modify the _meta part or if I need to do it all?
Yes*, No and no.
* Technically first yes should be no. Let me explain.
What we are doing here can be done with static inventory file like this:
Original ec2.py was returning json equivalent of following inventory file:
[tag_Service_Foo]
54.149.9.198 ec2_tag_Name="main-server" ec2_previous_state_code="0" ...
52.11.22.29 ec2_tag_Name="extra-server" ec2_previous_state_code="0" ...
our new my_ec2.py returns this:
[tag_Service_Foo]
main-server ansible_ssh_host="54.149.9.198" ec2_tag_Name="main-server" ec2_previous_state_code="0" ...
extra-server ansible_ssh_host="52.11.22.29" ec2_tag_Name="extra-server" ec2_previous_state_code="0" ...
# Technically it's possible to create "an alias" for main-server like this:
main-server-alias ansible_ssh_host="54.149.9.198" ec2_tag_Name="main-server" ec2_previous_state_code="0" ...
Now you would be able to run a play with main-server-alias in the host list and ansible would execute it on 54.149.9.198.
BUT, and this is a big BUT, when you run a play with 'all' as the host pattern ansible would run the task on main-server-alias as well as main-server. So what you created is an alias in one context and a new host in another. I've not tested this BUT part so do come back and correct me if you find out otherwise.
HTH
If you put
vpc_destination_variable = Name
in your ec2.ini file, that should work too.
For a more recent version, working since version 2.9 of Ansible, at least, the aws_ec2 plugin now allows this as a simple configuration, without the need to create any warper around ec2.py.
This can be done using a combination of the parameters hostnames and compose.
So the trick is to change the name of the EC2 instance via the hostnames parameter to have something human readable — for example from an AWS tag on the instance — and then to use the compose parameter to set the ansible_host to the private_ip_address, public_ip_address, private_dns_name or the public_dns_name to allow the connection, still.
Here would be the minimal configuration of the plugin to allow this:
plugin: aws_ec2
hostnames:
- tag:Name
# ^-- Given that you indeed have a tag named "Name" on your EC2 instances
compose:
ansible_host: public_dns_address
Mind, that, in AWS, the tags are not uniques, so you can have multiple instances tagged with the same name, but in Ansible, the hostname is something unique. This means that you will probably be better of composing the name with the tag:Name postfixing with something truly unique, like the instance ID.
Something like:
plugin: aws_ec2
hostnames:
- name: 'instance-id'
separator: '_'
prefix: 'tag:Name'
compose:
ansible_host: public_dns_address

Resources