How can I run local-exec provisionner AFTER cloud-init / user_data? - ansible

I'm experiencing a race condition issue on Terraform when running an Ansible playbook with the local-exec provisioner. At one point, that playbook has to install an APT package.
But first, I'm running a cloud-config file init.yml specified in the user_data argument that installs a package as well.
Consequently, I'm getting the following error :
Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
How can I prevent this?
# init.yml
runcmd:
- sudo apt-get update
- sudo apt-get -y install python python3
# main.tf
resource "digitalocean_droplet" "hotdog" {
image = "ubuntu-18-04-x64"
name = "my_droplet"
region = "FRA1"
size = "s-1vcpu-1gb"
user_data = file("init.yml")
provisioner "local-exec" {
command = "ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i '${self.ipv4_address},' ./playbook.yml"
}
}

Disclaimer: my terraform knowledge is quite sparse compared to my ansible one. The below should work but there might be terraform centric options I totally missed
A very easy solution is to use an until loop so as to retry the task until it succeeds.
- name: retry apt task every 5s during 1mn until it succeeds (e.g. lock is released)
apt:
name: my_package
register: apt_install
until: apt_install is success
delay: 5
retries: 12
A better approach would be to make sure there is no lock in place on the different dpkg lock files. I did not make the exercise to implement this in ansible and you might need a specific script or custom module to succeed. If you want to give it a try, there is a question with a solution on serverfault
In your context and since this problem seems quite common actually, I searched a bit and I came across this issue on github. I think that my adaptation as below will meet your requirements and might help for any other possible race condition inside your init phase.
Modify your user_data as:
---
runcmd:
- touch /tmp/user-init-running
- sudo apt-get update
- sudo apt-get -y install python python3
- rm /tmp/user-init-running
And in your playbook:
- name: wait for init phase to end. Error after 1mn
wait_for:
path: /tmp/user-init-running
state: absent
delay: 60
- name: install package
apt:
name: mypkg
state: present

Related

Ansible Run Shell Command Upon Condition in Multiple Hosts

I have the following script that attempts to install a package on a node only when not already installed.
- name: check if linux-modules-extra-raspi is installed # Task 1
package:
name: linux-modules-extra-raspi
state: present
check_mode: true
register: check_raspi_module
- name: install linux-modules-extra-raspi if not installed # Task 2
shell: |
sudo dpkg --configure -a
apt install linux-modules-extra-raspi
when: not check_raspi_module.changed
But the problem here is that if I have a set of hosts, the Task 1 runs for node n1 and registeres check_raspi_module to false and then Task 1 runs for node n2 and then sets it to true because that package is already available in node n2. So how can I throttle this and have the check_raspi_module local to a task and not global like it is now?
If you need to install package, you have just to use the first bloc like below. You haven't need to use block of check and install separatly.
Even if your package is installed, Ansible will detect it and not reinstall it. It’s the principe of Ansible
The documentation: here
(definition) state: present (mean install if not present)
- name: install if not present if linux-modules-extra-raspi
ansible.builtin.package:
name: linux-modules-extra-raspi
state: present

Ansible playbook check if service is up if not - install something

i need to install Symantec endpoint security on my linux system and im trying to write a playbook to do so
when i want to install the program i use ./install.sh -i
but after the installation when i run the installation again i get this msg:
root#TestKubuntu:/usr/SEP# ./install.sh -i
Starting to install Symantec Endpoint Protection for Linux
Downgrade is not supported. Please make sure the target version is newer than the original one.
this is how i install it in the playbook
- name: Install_SEP
command: bash /usr/SEP/install.sh -i
I would like if it's possible to maybe check if the service is up and if there is no service then install it or maybe there is a better way doing this.
Thank you very much for your time
Q: "I would like to check if the service is up and if there is no service then install it."
It's possible to use service_facts. For example to check a service is running
vars:
my_service: "<name-of-my-service>"
tasks:
- name: Collect service facts
service_facts:
- name: Install service when not running
command: "<install-service>"
when: "my_service not in ansible_facts.services|
dict2items|
json_query('[?value.state == `running`].key')"
To check a service installed use
json_query('[].key') }}"
(not tested)
Please try something like below.
- name: Check if service is up
command: <command to check if service is up>
register: output
- name: Install_SEP
command: bash /usr/SEP/install.sh -i
when: "'running' not in output.stdout"
Note: I have used running in when condition : If the service command returns something specific, include that instead of running.

How to force Ansible to use sudo to install packages?

I have a playbook than run roles, and logs in the server with a user that has the sudo privileges. The problem is that, when switching to this user, I still need to use sudo to, say, install packages.
ie:
sudo yum install httpd
However, Ansible seems to ignore that and will try to install packages without sudo, which will result as a fail.
Ansible will run the following:
yum install httpd
This is the role that I use:
tasks:
- name: Import du role 'memcacheExtension'
import_role:
name: memcacheExtension
become: yes
become_method: sudo
become_user: "{{become_user}}"
become_flags: '-i'
tags:
- never
- memcached
And this is the tasks that fails in my context:
- name: Install Memcached
yum:
name: memcached.x86_64
state: present
Am I setting the sudo parameter at the wrong place? Or am I doing something wrong?
Thank you in advance
You can specify become: yes a few places. Often it is used at the task level, sometimes it is used as command line parameter (--become, -b run operations with become). It can be also set at the play level:
- hosts: servers
become: yes
become_method: enable
tasks:
- name: Hello
...
You can also enable it in group_vars:
group_vars/exmaple.yml
ansible_become: yes
For your example, using it for installing software I would set it at the task level. I think in your case the import is the problem. You should set it in the file you are importing.
I ended up specifying Ansible to become root for some of the tasks that were failing (my example wasn't the only one failing, and it worked well. The tweak in my environment is that I can't login as root, but I can "become" root once logged in as someone else.
Here is how my tasks looks like now:
- name: Install Memcached
yum:
name: memcached.x86_64
state: present
become_user: root
Use shell module instead of yum.
- name: Install Memcached
shell: sudo yum install -y {{ your_package_here }}
Not as cool as using a module, but it will get the job done.
Your become_user is ok. If you don't use it, you'll end up trying to run the commands in the playbook, by using the user used to stablish the ssh connection (ansible_user or remote_user or the user used to execute the playbook).

Ansible playbook fails to lock apt

I took over a project that is running on Ansible for server provisioning and management. I'm fairly new to Ansible but thanks to the good documentation I'm getting my head around it.
Still I'm having an error which has the following output:
failed: [build] (item=[u'software-properties-common', u'python-pycurl', u'openssh-server', u'ufw', u'unattended-upgrades', u'vim', u'curl', u'git', u'ntp']) => {"failed": true, "item": ["software-properties-common", "python-pycurl", "openssh-server", "ufw", "unattended-upgrades", "vim", "curl", "git", "ntp"], "msg": "Failed to lock apt for exclusive operation"}
The playbook is run with sudo: yes so I don't understand why I'm getting this error (which looks like a permission error). Any idea how to trace this down?
- name: "Install very important packages"
apt: pkg={{ item }} update_cache=yes state=present
with_items:
- software-properties-common # for apt repository management
- python-pycurl # for apt repository management (Ansible support)
- openssh-server
- ufw
- unattended-upgrades
- vim
- curl
- git
- ntp
playbook:
- hosts: build.url.com
sudo: yes
roles:
- { role: postgresql, tags: postgresql }
- { role: ruby, tags: ruby }
- { role: build, tags: build }
I just had the same issue on a new VM. I tried many approaches, including retrying the apt commands, but in the end the only way to do this was by removing unattended upgrades.
I'm using raw commands here, since at this point the VM doesn't have Python installed, so I need to install that first, but I need a reliable apt for that.
Since it is a VM and I was testing the playbook by resetting it to a Snapshot, the system date was off, which forced me to use the date -s command in order to not have problems with the SSL certificate during apt commands. This date -s triggered an unattended upgrade.
So this snippet of a playbook is basically the part relevant to disabling unattended upgrades in a new system. They are the first commands I'm issuing on a new system.
- name: Disable timers for unattended upgrade, so that none will be triggered by the `date -s` call.
raw: systemctl disable --now {{item}}
with_items:
- 'apt-daily.timer'
- 'apt-daily-upgrade.timer'
- name: Reload systemctl daemon to apply the new changes
raw: systemctl daemon-reload
# Syncing time is only relevant for testing, because of the VM's outdated date.
#- name: Sync time
# raw: date -s "{{ lookup('pipe', 'date') }}"
- name: Wait for any possibly running unattended upgrade to finish
raw: systemd-run --property="After=apt-daily.service apt-daily-upgrade.service" --wait /bin/true
- name: Purge unattended upgrades
raw: apt-get -y purge unattended-upgrades
- name: Update apt cache
raw: apt-get -y update
- name: If needed, install Python
raw: test -e /usr/bin/python || apt-get -y install python
Anything else would cause apt commands to randomly fail because of locking issues caused by unattended upgrades.
This is a very common situation when provisioning Ubuntu (and likely some other distributions). You try to run Ansible while automatic updates are running in background (which is what happens right after setting up a new machine). As APT uses semaphore, Ansible gets kicked out.
The playbook is ok and the easiest way to verify is to run it later (after automatic update process finishes).
For a permanent resolution, you might want to:
use an OS image with automatic updates disabled
add an explicit loop in the Ansible playbook to repeat the failed task until it succeeds

Ansible 1.9.4 : Failed to lock apt for exclusive operation

I bumped into Failed to lock apt for exclusive operation issue:
https://github.com/geerlingguy/ansible-role-apache/issues/50
I posted a lot of details in GitHub.
I googled a lot of "Failed to lock apt for exclusive operation" Ansible complaints, but no simple answer. Any help?
I'm also getting this error, while setting up a couple of new boxes. I'm connecting as root, so I didn't think it was necessary, but it is:
become: yes
Now everything works as intended.
I know this question has been answered a long time ago, but for me, the solution was different.
The problem was the update_cache step. I had this with every install step, and somehow that caused the "apt failed lock error".
the solution was adding the update_cache as a seperate step, like so:
- tasks:
- name: update apt list
apt:
update_cache: yes
Running the following commands in the same sequence as below should resolve this issue :
sudo rm -rf /var/lib/apt/lists/*
sudo apt-get clean
sudo apt-get update
In my case, ansible was trying to lock /var/lib/apt/lists/ which did not exist.
The error ansible gave me was Failed to lock apt for exclusive operation: Failed to lock directory /var/lib/apt/lists/: E:Could not open lock file /var/lib/apt/lists/lock - open (2: No such file or directory)
Adding the lists directory fixed the issues:
- name: packages | ensure apt list dir exists
file:
path: /var/lib/apt/lists/
state: directory
mode: 0755
The answer from #guaka was entirely correct for me. It lacks only one thing-- where to put become: yes.
In the following, there are three places the become might go. Only one of them is correct:
- name: setup nginx web server
#1 become: yes
include_role:
name: nginx
#2 become: yes
vars:
ansible_user: "{{ devops_ssh_user }}"
#3 become: yes
In position #2 you will get an unexpected parameter error, because become is not a parameter for the include_role module.
In position #3 the play will run, but will not execute as sudo.
Only in position #1 does become: yes do any good. become is a parameter for the task, not for the module. It is not a variable like ansible_user. Just to be clear, here's a working configuration:
- name: setup nginx web server
become: yes
include_role:
name: nginx
vars:
ansible_user: "{{ devops_ssh_user }}"
I had remote_user: root in my playbook, and this happened a lot. I couldn't figure out why, it was happening only on some roles but not others (but always at the same place.) I removed remote_user: root and it stopped happening. I just use --become and --become-user=root. (Which I was using, but it still was randomly failing to get a lock for some reason.)
On my hosts file, I have some hosts that use a different user for sudo than the one Ansible uses for SSH so I had ansible_user and ansible_become_user specified for every host in the inventory.
In those hosts where ansible_user == ansible_become_user I got Failed to lock apt for exclusive operation.
The solution for me was to remove ansible_become_user line for those cases where the user is the same in both parameters.
Ok, so there is nothing wrong with Ansible, SSH or the role. It is just that apt in Debian can get really confused and lock itself out. As I was using a home-brewed Docker VM, I only had to recreate my image and container to get APT working again.
I faced the same error while using ansible to set up a ceph cluster. Just running the ansible-playbook without sudo did the trick for me!
adding become: yes to the task is solved my problem.
- name: Install dependencies
apt:
name: "{{ packages }}"
state: present
update_cache: yes
vars:
packages:
- package1
- package2
- package3
become: yes
In my case, the silly problem was that I had the same host in my inventory, but under two different DNS names.
So, apt was legitimately locked in my case, because I had multiple ansible connections to the same host doing the same thing at the same time. Hah! Doh!
https://serverfault.com/questions/716260/ansible-sudo-error-while-building-on-atlas
Check ps aux | grep apt output for suspicious processes.

Resources