Debugging Ansible modules - ansible

I tried to write an Ansible module. My module is buggy. When I run it from a playbook I get the following unreadable output:
$ ansible-playbook lacp.yml
PLAY [xxxxxxxx] ****************************************************************
TASK [Test that my module works] ***********************************************
fatal: [xxxxxxxx]: FAILED! => {"changed": false, "failed": true, "module_stderr": "couldn't set locale correctly\ncouldn't set locale correctly\ncouldn't set locale correctly\nTraceback (most recent call last):\n File \"/tmp/ansible_vHkWq8/ansible_module_lacp.py\", line 27, in <module>\n main()\n File \"/tmp/ansible_vHkWq8/ansible_module_lacp.py\", line 14, in main\n m = re.match('^key: ([0-9]+) ', dladm.readline())\nAttributeError: 'Popen' object has no attribute 'readline'\ndladm: insufficient privileges\n", "module_stdout": "", "msg": "MODULE FAILURE", "parsed": false}
NO MORE HOSTS LEFT *************************************************************
[WARNING]: Could not create retry file 'lacp.retry'. [Errno 2] No such file or directory: ''
PLAY RECAP *********************************************************************
xxxxxxxx : ok=0 changed=0 unreachable=0 failed=1
How to stop Ansible quoting error messages with JSON? Or is there another way to debug Ansible modules?

How to stop Ansible quoting error messages with JSON?
You can use human_log.py plugin to force Ansible to interpret and print newline characters in its output.
You put the file into /path/to/callback_plugins/ directory and add the following to the ansible.cfg:
[defaults]
callback_plugins = /path/to/callback_plugins/
The detailed instructions are in the Human-Readable Ansible Playbook Log Output Using Callback Plugin blog post.

You can look into this and this callback plugins (for Ansible 2.x).
You will need to modify them a bit, because they don't convert module_stderr out of the box.
Also you may want to execute playbook with ANSIBLE_KEEP_REMOTE_FILES=1, then ssh to the remote box and debug your module in-place, then save to ansible library.

Debugging Ansible modules will quickly become next to impossible and very, very time consuming without following the recommended approach.
The recommended approach is to build your Ansible stuff using very small steps. That way you can more easily guess what is wrong as you add stuff to what you know and have verified to work.
So when you state that the module is buggy, you have gone to far. You will be searching for a needle in the haystack that Ansible without question is.
Refactoring is not really a practical option. You basically start fresh, recreating your code step by step.
I hope you noticed that Ansible doesn't even bother to format error output in a human readable way. In a way on error Ansible outputs the same message: something went wrong.
Let's say I have this Ansible task
- name: Mymodule
mymodule:
something: "something"
My module also is simple enough
#!/usr/bin/python
from ansible.module_utils.basic import *
def somefunction(data):
has_changed = False
meta = { "something": "something"}
return (has_changed, meta)
def main():
fields = {
"something": {"required": True, "type": "str"},
"state": {
"default": "perform",
"choices": ["perform"],
"type": "str"
},
}
choice_map = {
"perform2": somefunction,
}
module = AnsibleModule(argument_spec=fields)
has_changed, result = choice_map.get(module.params['state'])(module.params)
module.exit_json(changed=has_changed, meta=result)
if __name__ == '__main__':
main()
Ansible will produce the following error message
TASK [backup : Mymodule]
******************************************************* fatal: [myapp]: FAILED! => {"changed": false, "module_stderr": "Shared
connection to 127.0.0.1 closed.\r\n", "module_stdout": "Traceback
(most recent call last):\r\n File
\"/home/vagrant/.ansible/tmp/ansible-tmp-1570188887.99-191548982937437/AnsiballZ_mymodule.py\",
line 114, in \r\n _ansiballz_main()\r\n File
\"/home/vagrant/.ansible/tmp/ansible-tmp-1570188887.99-191548982937437/AnsiballZ_mymodule.py\",
line 106, in _ansiballz_main\r\n invoke_module(zipped_mod,
temp_path, ANSIBALLZ_PARAMS)\r\n File
\"/home/vagrant/.ansible/tmp/ansible-tmp-1570188887.99-191548982937437/AnsiballZ_mymodule.py\",
line 49, in invoke_module\r\n imp.load_module('main', mod,
module, MOD_DESC)\r\n File
\"/tmp/ansible_mymodule_payload_XTaVPp/main.py\", line 29, in
\r\n File
\"/tmp/ansible_mymodule_payload_XTaVPp/main.py\", line 25, in
main\r\nTypeError: 'NoneType' object is not callable\r\n", "msg":
"MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}
The "message" we should focus on is
TypeError: 'NoneType' object is not callable
It is caused by the wrong action perform2. It should be perform. A simple typo.
choice_map = {
"perform2": somefunction,
}
The typo is in module file mymodule.py on line 21. The files and lines 114, 106, 49, 29, 25 might be useful in someway but how these files are useful is not clear at all.
This is just a very simple example to illustrate the point of the haystack. Ansible does not format the error message in human readable way. Reporting on the problem file and line number is also not an exact science. And the error message is not useful. The error message should be that my choice_map is referencing a non-existing action. It could list the available choices.
IMHO this is a common problem with Ansible. A typing mistake can take a hour to fix.
The only way to workaround this limitation is to build up provision step by step. Baby steps.

Related

Parsing value from non-trivial JSON using Ansibles uri module

I have this (in the example shown I reduced it by removing many lines) non-trivial JSON retrieved from a Spark server:
{
"spark.worker.cleanup.enabled": true,
"spark.worker.ui.retainedDrivers": 50,
"spark.worker.cleanup.appDataTtl": 7200,
"fusion.spark.worker.webui.port": 8082,
"fusion.spark.worker.memory": "4g",
"fusion.spark.worker.port": 8769,
"spark.worker.timeout": 30
}
I try to read fusion.spark.worker.memory but fail to do so. In my debug statements I can see that the information is there:
msg: "Spark memory: {{spark_worker_cfg.json}} shows this:
ok: [process1] => {
"msg": "Spark memory: {u'spark.worker.ui.retainedDrivers': 50, u'spark.worker.cleanup.enabled': True, u'fusion.spark.worker.port': 8769, u'spark.worker.cleanup.appDataTtl': 7200, u'spark.worker.timeout': 30, u'fusion.spark.worker.memory': u'4g', u'fusion.spark.worker.webui.port': 8082}"
}
The dump using var: spark_worker_cfg shows this:
ok: [process1] => {
"spark_worker_cfg": {
"changed": false,
"connection": "close",
"content_length": "279",
"content_type": "application/json",
"cookies": {},
"cookies_string": "",
"failed": false,
"fusion_request_id": "Pj2zeWThLw",
"json": {
"fusion.spark.worker.memory": "4g",
"fusion.spark.worker.port": 8769,
"fusion.spark.worker.webui.port": 8082,
"spark.worker.cleanup.appDataTtl": 7200,
"spark.worker.cleanup.enabled": true,
"spark.worker.timeout": 30,
"spark.worker.ui.retainedDrivers": 50
},
"msg": "OK (279 bytes)",
"redirected": false,
"server": "Jetty(9.4.12.v20180830)",
"status": 200,
"url": "http://localhost:8765/api/v1/configurations?prefix=spark.worker"
}
}
I can't access the value using {{spark_worker_cfg.json.fusion.spark.worker.memory}}, my problem seems to be caused by the names containing dots:
The task includes an option with an undefined variable. The error was:
'dict object' has no attribute 'fusion'
I have had a look at two SO posts (1 and 2) that look like duplicates of my question but could not derive from them how to solve my current issue.
The keys in the 'json' element of the data structure, contain literal dots, rather than represent a structure. This will causes issues, because Ansible will not know to treat them as literal if dotted notation is used. Therefore, use square bracket notation to reference them, rather than dotted:
- debug:
msg: "{{ spark_worker_cfg['json']['fusion.spark.worker.memory'] }}"
(At first glance this looked like an issue with a JSON encoded string that needed decoding, which could have been handled:"{{ spark_worker_cfg.json | from_json }}")
You could use the json_query filter to get your results. https://docs.ansible.com/ansible/latest/user_guide/playbooks_filters.html
msg="{{ spark_worker_cfg.json | json_query('fusion.spark.worker.memory') }}
edit:
In response to your comment, the fact that we get an empty string returned leads me to believe that the query isn't correct. It can be frustrating to find the exact query while using the json_query filter so I usually use a jsonpath tool beforehand. I've linking one in my comment below but I, personally, use the jsonUtils addon in intelliJ to find my path (which still needs adjustment because the paths are handled a bit differently between the two).
If your json looked like this:
{
value: "theValue"
}
then
json_query('value')
would work.
The path you're passing to json_query isn't correct for what you're trying to do.
If your top level object was named fusion_spark_worker_memory (without the periods), then your query should work. The dots are throwing things off, I believe. There may be a way to escape those in the query...
edit 2: clockworknet for the win! He beat me to it both times. :bow:

Win_service "Cannot start service InstallerService on computer '.'."

I'm running into an issue where I'm not sure what is cause, here is the setup:
Following the Ansible win_service doc
Debian GNU/Linux buster/sid
ansible 2.7.5
config file = /home/ansible/ansibleGalaxy/ansible.cfg
configured module search path = [u'/home/ansible/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.15+ (default, Nov 28 2018, 16:27:22) [GCC 8.2.0]
Playbook:
- name: Update Dalet Installer service
win_service:
name: DaletInstallerService
username: .\Administrator
password: toto
start_mode: auto
state: started
Basically I want to update the credential of the service on the target machine.
And whatever setting i'm getting this error message:
fatal: [192.168.56.103]: FAILED! => {
"can_pause_and_continue": false,
"changed": false,
"depended_by": [],
"dependencies": [
"Afd",
"Tcpip"
],
"description": "DaletInstallerService",
"desktop_interact": false,
"display_name": "DaletInstallerService",
"exists": true,
"msg": "Service 'DaletInstallerService (DaletInstallerService)' cannot be started due to the following error: Cannot start service DaletInstallerService on computer '.'.",
"name": "DaletInstallerService",
"path": "'C:\\Program Files (x86)\\DALET\\DaletInstaller\\DaletInstallerService.prunsrv.exe' //RS//DaletInstallerService",
"start_mode": "auto",
"state": "stopped",
"username": ".\\Administrator"
}
ERROR! Unexpected Exception, this is probably a bug: 'ascii' codec can't encode character u'\xa0' in position 29: ordinal not in range(128)
the full traceback was:
Traceback (most recent call last):
File "/usr/bin/ansible-playbook", line 118, in <module>
exit_code = cli.run()
File "/usr/lib/python2.7/dist-packages/ansible/cli/playbook.py", line 122, in run
results = pbex.run()
File "/usr/lib/python2.7/dist-packages/ansible/executor/playbook_executor.py", line 156, in run
result = self._tqm.run(play=play)
File "/usr/lib/python2.7/dist-packages/ansible/executor/task_queue_manager.py", line 291, in run
play_return = strategy.run(iterator, play_context)
File "/usr/lib/python2.7/dist-packages/ansible/plugins/strategy/linear.py", line 325, in run
results += self._wait_on_pending_results(iterator)
File "/usr/lib/python2.7/dist-packages/ansible/plugins/strategy/__init__.py", line 712, in _wait_on_pending_results
results = self._process_pending_results(iterator)
File "/usr/lib/python2.7/dist-packages/ansible/plugins/strategy/__init__.py", line 135, in inner
dbg.cmdloop()
File "/usr/lib/python2.7/dist-packages/ansible/plugins/strategy/__init__.py", line 1166, in cmdloop
cmd.Cmd.cmdloop(self)
File "/usr/lib/python2.7/cmd.py", line 130, in cmdloop
line = raw_input(self.prompt)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 29: ordinal not in range(128)
Still looking for a workaround or a fix, but any input would be welcome to understand the root cause.
Matth

Building custom module Ansible

I am trying to build a custom module for our private cloud Infrastructure.
I followed this doc http://docs.ansible.com/ansible/latest/dev_guide/developing_modules_general.html
I created a my_module.py module file.
When I hit ansible-playbook playbook/my_module.yml
Response:
PLAY [Create, Update, Delete VM] *********************************************************************************************************
TASK [Gathering Facts] ************************************************************************************************************************
ok: [localhost]
TASK [VM create] *************************************************************************************************************************
changed: [localhost]
TASK [dump test output] ***********************************************************************************************************************
ok: [localhost] => {
"msg": {
"changed": true,
"failed": false,
"message": "goodbye",
"original_message": "pcp_vm_ansible"
}
}
PLAY RECAP ************************************************************************************************************************************
localhost : ok=3 changed=1 unreachable=0 failed=0
Which means it is working fine as expected.
Module.py
from ansible.module_utils.basic import AnsibleModule
def run_module():
module_args = dict(
name=dict(type='str', required=True),
new=dict(type='bool', required=False, default=False)
)
result = dict(
changed=False,
original_message='',
message=''
)
module = AnsibleModule(
argument_spec=module_args,
supports_check_mode=True
)
if module.check_mode:
return result
result['original_message'] = module.params['name']
result['message'] = 'goodbye'
if module.params['new']:
result['changed'] = True
if module.params['name'] == 'fail me':
module.fail_json(msg='You requested this to fail', **result)
module.exit_json(**result)
def main():
print("================== Main Called =======================")
run_module()
if __name__ == '__main__':
main()
I am trying to print logs to visualize my input data using print() or even logging.
print("================== Main Called =======================")
But nothing is getting printed to console.
As per Conventions, Best Practices, and Pitfalls, "Modules must output valid JSON only. The top level return type must be a hash (dictionary) although they can be nested. Lists or simple scalar values are not supported, though they can be trivially contained inside a dictionary."
Effectively, the core runtime only communicates with the module via JSON and the core runtime controls stdout so the standard print statements from the module are suppressed. If you want or need more information out of the execution runtime then I suggest a Callback Plugin.

bypass untrusted sites with selenium3

I am stuck for good. I use the python client of selenium3.0 with Mozilla Firefox 49.0.2 and phantomjs 2.1.1.
the problem is that the site has not valid certificates thus it is stuck in a page "Your connection is not secure". i cant remember what was different before but this used to work with the following code
profile = webdriver.FirefoxProfile()
profile.accept_untrusted_certs = True
profile.assume_untrusted_cert_issuer = False
profile.set_preference("network.proxy.no_proxies_on","localhost,127.0.0.1,"+url)
driver = webdriver.Firefox(profile)
drive.get('https://'+url'+'port)
but not anymore. i should have upgraded Firefox in some point. Btw the error is
Traceback (most recent call last):
File "seleniumtest.py", line 75, in <module>
open_browser(url,port,"user","rZMBlg4ZpOX")
File "seleniumtest.py", line 52, in open_browser
driver.get(g)
File "/home/iob/Envs/selenium/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 248, in get
self.execute(Command.GET, {'url': url})
File "/home/iob/Envs/selenium/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "/home/iob/Envs/selenium/local/lib/python2.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 196, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Error loading page
I found that Firefox uses marionette as the default FirefoxDriver since F38, i think. i dont know if this has to do anything with the problem. I tried to debug the issue and see the response. i found that the response is like that:
{u'sessionId': u'7334d50d-f188-4c70-be4e-3d1440cf21ec', u'value': {u'processId': 32334, u'browserVersion': u'49.0.2', u'takesScreenshot': True, u'acceptSslCerts': False, u'appBuildId': u'20161025170400', u'XULappId': u'{ec8030f7-c20a-464f-9b0e-13a3a9e97384}', u'javascriptEnabled': True, u'raisesAccessibilityExceptions': False, u'specificationLevel': 0, u'platform': u'LINUX', u'browserName': u'firefox', u'version': u'', u'proxy': {}, u'marionette': True, u'rotatable': False, u'device': u'desktop', u'takesElementScreenshot': True, u'platformName': u'linux', u'platformVersion': u'4.4.0-47-generic', u'command_id': 1}
i noticed u'acceptSslCerts': False so i thought that this is what I need to set for it to work. But I am not sure how exactly and whatever I tried doesnt seem to work (ex: capabilities["acceptSslCerts"]= True)
With phantomjs i use the code below
service_args = ['--proxy=https://10.241.226.200:5601',
'--proxy-type=https',
'--proxy-auth='+username+':'+password,
'--ignore-ssl-errors=true', '--ssl-protocol=any']
driver=webdriver.PhantomJS(service_args=service_args)
driver.get("https://"+url+":"+port)
That doesnt raise any error but neither returns the title so i assume that something is wrong and it didnt open the link properly.
Any idea? Any link for elaboration would be helpful

"pymysql.err.InternalError: (1046, u'No database selected')" error message when running Python script from command line

I'm trying to run a Python script that connects to a MySQL database through PyMySQL. The script is effectively:
import pymysql
cnx = pymysql.connect(read_default_file = "/directory/my.cnf", cursorclass = pymysql.cursors.DictCursor)
cursor = cnx.cursor()
# Do stuff.
When I run the script in the interpreter, I don't get any errors, but when I try to run it from the command line, I get the following error:
Traceback (most recent call last):
File "s02_prepare_data_RNN.py", line 264, in <module>
(omniture, urls, years, global_regions) = get_omniture_data("omniture_results")
File "s02_prepare_data_RNN.py", line 76, in get_omniture_data
sso_to_accountid = get_sso_accountids()
File "s02_prepare_data_RNN.py", line 31, in get_sso_accountids
cursor.execute(query)
File "/home/rdu/malcorn/.local/lib/python2.6/site-packages/pymysql/cursors.py", line 134, in execute
result = self._query(query)
File "/home/rdu/malcorn/.local/lib/python2.6/site-packages/pymysql/cursors.py", line 282, in _query
conn.query(q)
File "/home/rdu/malcorn/.local/lib/python2.6/site-packages/pymysql/connections.py", line 768, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/home/rdu/malcorn/.local/lib/python2.6/site-packages/pymysql/connections.py", line 929, in _read_query_result
result.read()
File "/home/rdu/malcorn/.local/lib/python2.6/site-packages/pymysql/connections.py", line 1125, in read
first_packet = self.connection._read_packet()
File "/home/rdu/malcorn/.local/lib/python2.6/site-packages/pymysql/connections.py", line 893, in _read_packet
packet.check_error()
File "/home/rdu/malcorn/.local/lib/python2.6/site-packages/pymysql/connections.py", line 369, in check_error
err.raise_mysql_exception(self._data)
File "/home/rdu/malcorn/.local/lib/python2.6/site-packages/pymysql/err.py", line 120, in raise_mysql_exception
_check_mysql_exception(errinfo)
File "/home/rdu/malcorn/.local/lib/python2.6/site-packages/pymysql/err.py", line 115, in _check_mysql_exception
raise InternalError(errno, errorvalue)
pymysql.err.InternalError: (1046, u'No database selected')
Modifying my code from:
import pymysql
cnx = pymysql.connect(read_default_file = "/directory/my.cnf", cursorclass = pymysql.cursors.DictCursor)
cursor = cnx.cursor()
# Do stuff.
to:
import pymysql
cnx = pymysql.connect(read_default_file = "/directory/my.cnf", cursorclass = pymysql.cursors.DictCursor)
cursor = cnx.cursor()
if __name__ == "__main__":
# Do stuff.
fixed the error. I had the thought that Python might be trying to execute the queries before the connection was established, so I tried putting the main part of my program under if __name__ == "__main__": and that fixed it. I'm still not 100% what is going on, though. I had assumed the code would wait for the connection to be established before proceeding to the following lines, but this fix suggests that's not the case.
It's also worth noting that I was only getting the error when running the original script from the command line on a server that has Python 2.6. When I ran the original script from the command line on my local machine that has Python 2.7, I did not get the error.
Anyway, if __name__ == "__main__": is good Python style, so I'll make sure to use it in the future.
Sometimes this may simply happen when the database URI is not built correctly. For me, the error occurred when my pymysql connection string (URL) looked like this:
mysql+pymysql://:#localhost/
To fix it, it needed to look like:
mysql+pymysql://<my-database-user-name>:#localhost/<my-database-name> # or replace localhost with your particular host name
If you don't select the database in connection, you must add this line.
connection = pymysql.connect(host=DB_host, user=DB_user, password=DB_password) # NO DB_NAME SPECIFED
cursor = connection.cursor()
cursor.execute(f'CREATE DATABASE IF NOT EXISTS {DB_NAME}') # ADD THIS LINE!
connection.select_db(DB_NAME)

Resources