Ansible playbook fails to lock apt - ansible

I took over a project that is running on Ansible for server provisioning and management. I'm fairly new to Ansible but thanks to the good documentation I'm getting my head around it.
Still I'm having an error which has the following output:
failed: [build] (item=[u'software-properties-common', u'python-pycurl', u'openssh-server', u'ufw', u'unattended-upgrades', u'vim', u'curl', u'git', u'ntp']) => {"failed": true, "item": ["software-properties-common", "python-pycurl", "openssh-server", "ufw", "unattended-upgrades", "vim", "curl", "git", "ntp"], "msg": "Failed to lock apt for exclusive operation"}
The playbook is run with sudo: yes so I don't understand why I'm getting this error (which looks like a permission error). Any idea how to trace this down?
- name: "Install very important packages"
apt: pkg={{ item }} update_cache=yes state=present
with_items:
- software-properties-common # for apt repository management
- python-pycurl # for apt repository management (Ansible support)
- openssh-server
- ufw
- unattended-upgrades
- vim
- curl
- git
- ntp
playbook:
- hosts: build.url.com
sudo: yes
roles:
- { role: postgresql, tags: postgresql }
- { role: ruby, tags: ruby }
- { role: build, tags: build }

I just had the same issue on a new VM. I tried many approaches, including retrying the apt commands, but in the end the only way to do this was by removing unattended upgrades.
I'm using raw commands here, since at this point the VM doesn't have Python installed, so I need to install that first, but I need a reliable apt for that.
Since it is a VM and I was testing the playbook by resetting it to a Snapshot, the system date was off, which forced me to use the date -s command in order to not have problems with the SSL certificate during apt commands. This date -s triggered an unattended upgrade.
So this snippet of a playbook is basically the part relevant to disabling unattended upgrades in a new system. They are the first commands I'm issuing on a new system.
- name: Disable timers for unattended upgrade, so that none will be triggered by the `date -s` call.
raw: systemctl disable --now {{item}}
with_items:
- 'apt-daily.timer'
- 'apt-daily-upgrade.timer'
- name: Reload systemctl daemon to apply the new changes
raw: systemctl daemon-reload
# Syncing time is only relevant for testing, because of the VM's outdated date.
#- name: Sync time
# raw: date -s "{{ lookup('pipe', 'date') }}"
- name: Wait for any possibly running unattended upgrade to finish
raw: systemd-run --property="After=apt-daily.service apt-daily-upgrade.service" --wait /bin/true
- name: Purge unattended upgrades
raw: apt-get -y purge unattended-upgrades
- name: Update apt cache
raw: apt-get -y update
- name: If needed, install Python
raw: test -e /usr/bin/python || apt-get -y install python
Anything else would cause apt commands to randomly fail because of locking issues caused by unattended upgrades.

This is a very common situation when provisioning Ubuntu (and likely some other distributions). You try to run Ansible while automatic updates are running in background (which is what happens right after setting up a new machine). As APT uses semaphore, Ansible gets kicked out.
The playbook is ok and the easiest way to verify is to run it later (after automatic update process finishes).
For a permanent resolution, you might want to:
use an OS image with automatic updates disabled
add an explicit loop in the Ansible playbook to repeat the failed task until it succeeds

Related

Problems when installing git on my targets

I have some problems when I try to install git on my target node.
1st method: I used the ansible command
ansible <ip-node> -u root -b -K -m raw -a "apt install -y git"
and I have this error on my controller node terminal:
E: Impossible de récupérer certaines archives, peut-être devrez-vous lancer apt-get update ou essayer avec --fix-missing ? )
2nd method: I played the following playbook
name: This sets up an git
hosts: vm2
tasks:
- name: install git
apt:
name: git
state: present
cache_update: True
I get an other error
fatal: [192.168.57.10]: FAILED! => {"changed": false, "msg": "Unsupported parameters for (apt) module: cache_update. Supported parameters include: allow_downgrade (allow-downgrade, allow-downgrades, allow_downgrades), policy_rc_d, autoremove, force_apt_get, update_cache_retry_max_delay, fail_on_autoremove, install_recommends (install-recommends), update_cache_retries, default_release (default-release), state, autoclean, cache_valid_time, only_upgrade, deb, purge, allow_unauthenticated (allow-unauthenticated), lock_timeout, upgrade, dpkg_options, package (name, pkg), force, update_cache (update-cache)."})
My questions are:
How can I debug the above errors?
I'm not sure but I suspect my errors happen because my target node does not have internet access. For example, ping google.fr fails. Could this be the issue?
What should I change in my target node network configuration to fix my issues?
Note that you tried cache_update, it should be update_cache as the error message tells. Tip: the package module chooses the package manager on your target machines (apt, rpm, etc.)
- name: Install git
become: true
become_method: sudo
become_user: root
package:
name: git
state: present
update_cache: true

How can I run local-exec provisionner AFTER cloud-init / user_data?

I'm experiencing a race condition issue on Terraform when running an Ansible playbook with the local-exec provisioner. At one point, that playbook has to install an APT package.
But first, I'm running a cloud-config file init.yml specified in the user_data argument that installs a package as well.
Consequently, I'm getting the following error :
Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
How can I prevent this?
# init.yml
runcmd:
- sudo apt-get update
- sudo apt-get -y install python python3
# main.tf
resource "digitalocean_droplet" "hotdog" {
image = "ubuntu-18-04-x64"
name = "my_droplet"
region = "FRA1"
size = "s-1vcpu-1gb"
user_data = file("init.yml")
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i '${self.ipv4_address},' ./playbook.yml"
}
}
Disclaimer: my terraform knowledge is quite sparse compared to my ansible one. The below should work but there might be terraform centric options I totally missed
A very easy solution is to use an until loop so as to retry the task until it succeeds.
- name: retry apt task every 5s during 1mn until it succeeds (e.g. lock is released)
apt:
name: my_package
register: apt_install
until: apt_install is success
delay: 5
retries: 12
A better approach would be to make sure there is no lock in place on the different dpkg lock files. I did not make the exercise to implement this in ansible and you might need a specific script or custom module to succeed. If you want to give it a try, there is a question with a solution on serverfault
In your context and since this problem seems quite common actually, I searched a bit and I came across this issue on github. I think that my adaptation as below will meet your requirements and might help for any other possible race condition inside your init phase.
Modify your user_data as:
---
runcmd:
- touch /tmp/user-init-running
- sudo apt-get update
- sudo apt-get -y install python python3
- rm /tmp/user-init-running
And in your playbook:
- name: wait for init phase to end. Error after 1mn
wait_for:
path: /tmp/user-init-running
state: absent
delay: 60
- name: install package
apt:
name: mypkg
state: present

How to force Ansible to use sudo to install packages?

I have a playbook than run roles, and logs in the server with a user that has the sudo privileges. The problem is that, when switching to this user, I still need to use sudo to, say, install packages.
ie:
sudo yum install httpd
However, Ansible seems to ignore that and will try to install packages without sudo, which will result as a fail.
Ansible will run the following:
yum install httpd
This is the role that I use:
tasks:
- name: Import du role 'memcacheExtension'
import_role:
name: memcacheExtension
become: yes
become_method: sudo
become_user: "{{become_user}}"
become_flags: '-i'
tags:
- never
- memcached
And this is the tasks that fails in my context:
- name: Install Memcached
yum:
name: memcached.x86_64
state: present
Am I setting the sudo parameter at the wrong place? Or am I doing something wrong?
Thank you in advance
You can specify become: yes a few places. Often it is used at the task level, sometimes it is used as command line parameter (--become, -b run operations with become). It can be also set at the play level:
- hosts: servers
become: yes
become_method: enable
tasks:
- name: Hello
...
You can also enable it in group_vars:
group_vars/exmaple.yml
ansible_become: yes
For your example, using it for installing software I would set it at the task level. I think in your case the import is the problem. You should set it in the file you are importing.
I ended up specifying Ansible to become root for some of the tasks that were failing (my example wasn't the only one failing, and it worked well. The tweak in my environment is that I can't login as root, but I can "become" root once logged in as someone else.
Here is how my tasks looks like now:
- name: Install Memcached
yum:
name: memcached.x86_64
state: present
become_user: root
Use shell module instead of yum.
- name: Install Memcached
shell: sudo yum install -y {{ your_package_here }}
Not as cool as using a module, but it will get the job done.
Your become_user is ok. If you don't use it, you'll end up trying to run the commands in the playbook, by using the user used to stablish the ssh connection (ansible_user or remote_user or the user used to execute the playbook).

Ansible Yum Module pending transactions error

I'm very new to Ansible.
I am trying to follow a tutorial on the concept of Roles in Ansible.
I have the following Master Playbook:
--- # Master Playbook for Webservers
- hosts: apacheweb
user: test
sudo: yes
connection: ssh
roles:
- webservers
Which refers to the webservers role that has the following task/main.yml:
- name: Install Apache Web Server
yum: pkg=httpd state=latest
notify: Restart HTTPD
And a handler/main.yml:
- name: Restart HTTPD
service: name=httpd state=started
When I execute the Master Playbook, mentioned above, I get the following error:
TASK [webservers : Install Apache Web Server] **********************************
fatal: [test.server.com]: FAILED! => {"changed": false, "failed": true, "msg": "The following packages have pending transactions: httpd-x86_64", "rc": 128, "results": ["The following packages have pending transactions: httpd-x86_64"]}
I cannot understand what this error corresponds to. There does not seem to be anything similar, based on my research, that could suggest the issue with the way I am using the Yum module.
NOTE: Ansible Version:
ansible 2.2.1.0
config file = /etc/ansible/ansible.cfg
It seems there are unfinished / pending transactions on the target host.
Try installing yum-utils package to run yum-complete-transaction to the target hosts giving the error.
# yum-complete-transaction --cleanup-only
Look at Fixing There are unfinished transactions remaining for more details.
yum-complete-transaction is a program which finds incomplete or
aborted yum transactions on a system and attempts to complete them. It
looks at the transaction-all* and transaction-done* files which can
normally be found in /var/lib/yum if a yum transaction aborted in the
middle of execution.
If it finds more than one unfinished transaction it will attempt to
complete the most recent one first. You can run it more than once to
clean up all unfinished transactions.
Unfinished transaction remaining
sudo yum install yum-utils
yum-complete-transaction --cleanup-only
I am using for ansible this type of config for the playbooks:
- name: Install Apache Web Server
yum: name=httpd state=latest
notify: Restart HTTPD
As far as i know there is no such option as yum: pkg=httpd in ansbile for the yum module (if I'm not wrong, that pkg=httpd is for apt-get on debian based distros)
If you need to install multiple packages you could use something like:
- name: "Install httpd packages"
yum: name={{ item }} state=present
with_items:
- httpd
- httpd-devel
- httpd-tools
Of course you can change the state=present to state=latest or whatever option might suits you best
http://docs.ansible.com/ansible/yum_module.html - ansible documentation for yum module

Ansible apt module showing "item changed" when I don't think it did

I am trying to install Apache2 with Ansible. I have a role and handler for Apache.
My playbook (site.yml) contains:
---
- hosts: webservers
remote_user: ansrun
become: true
become_method: sudo
The Ansible role file contains:
---
- name: Install Apache 2
apt: name={{ item }} update_cache=yes state=present
with_items:
- apache2
when: ansible_distribution == "Ubuntu"
- name: Enable mod_rewrite
apache2_module: name=rewrite state=present
notify:
- reload apache2
Whenever I run the playbook, I get this message, but nothing has changed.
changed: [10.0.1.200] => (item=[u'apache2'])
I think this has something to do with the conditional.
You are running into a problem introduced to Ansible 2.2.0 (and fixed in 2.2.1).
With update_cache=yes the apt module was made to return changed-status whenever APT cache update occurred, not only when the actual package was upgraded.
You need to upgrade Ansible to version 2.2.1 (released officially on Jan 16th)
You need to do one of the following:
upgrade Ansible to at least 2.2.1 (currently in release candidate state and not available in PyPI, so you'd have to run Ansible from source);
downgrade Ansible to 2.1.3;
retain Ansible 2.2.0 and split the Install Apache 2 task into two:
one for cache update only (maybe with changed_when set to false),
one for the actual apache2 package installation (without update_cache=yes), calling the handler.

Resources