Create and use group without restart - ansible

I have a task, that creates a group.
- name: add user to docker group
user: name=USERNAME groups=docker append=yes
sudo: true
In another playbook I need to run a command that relies on having the new group permission. Unfortunately this does not work because the new group is only loaded after I logout and login again.
I have tried some stuff like:
su -l USERNAME
or
newgrp docker; newgrp
But nothing worked. Is there any change to force Ansible to reconnect to the host and does a relogin? A reboot would be the last option.

You can use an (ansible.builtin.)meta: reset_connection task:
- name: add user to docker group
ansible.builtin.user:
name: USERNAME
groups: docker
append: yes
- name: reset ssh connection to allow user changes to affect ansible user
ansible.builtin.meta:
reset_connection
Note that you can not use a variable to only run the task when the ansible.builtin.user task did a change as “reset_connection task does not support when conditional”, see #27565.
The reset_connection meta task was added in Ansible 2.3, but remained a bit buggy until excluding v2.5.8, see #27520.

For Ansible 2 I created a Galaxy role: https://galaxy.ansible.com/udondan/ssh-reconnect/
Usage:
- name: add user to docker group
user: name=USERNAME groups=docker append=yes
sudo: true
notify:
- Kill all ssh connections
If you immediately need the new group you can either call the module yourself:
- name: Kill own ssh connections
ssh-reconnect: all=True
Or alternatively fire the handlers when required
- meta: flush_handlers
For Ansible < 1.9 see this answer:
Do you use ssh control sockets? If you have ControlMaster activated in your ssh config, this would explain the behavior. Ansible re-connects for every task, so the user should have the correct role assigned on the next task. Though when you use ssh session sharing, Ansible would of course re-use the open ssh connection and therefore result in not logging in again.
You can deactivate the session sharing in your ansible.cfg:
[ssh_connection]
ssh_args= -S "none"
Since session sharing is a good thing to speed up Ansible plays, there is an alternative. Run a task which kills all ssh connections for your current user.
- name: add user to docker group
user: name=USERNAME groups=docker append=yes
sudo: true
register: user_task
- name: Kill open ssh sessions
shell: "ps -ef | grep sshd | grep `whoami` | awk '{print \"kill -9\", $2}' | sh"
when: user_task | changed
failed_when: false
This will force Ansible to re-login at the next task.

Another option I've found would be to use async: to queue up killing sshd in the background, without relying on an open connection. It feels incredibly hacky, but it seems to work reliably in both Ansible 1.9 and 2.0.
- name: Kill SSH
shell: sleep 1; pkill -u {{ ansible_ssh_user }} sshd
async: 3
poll: 2
Pause for 1 second, then kill sshd. Start checking for the job to be finished after 2 seconds, maximum allowed time is 3 seconds. In my limited testing, it seems to solve the problem of refreshing the current user's groups with only a minimal delay.

Try to remove socket folder during the play, it works on my side (I don't know if it's the thinest solution). Oddly, meta: reset_connection is not working with Ansible 2.4
- name: reset ssh connection
local_action:
module: file
path: "~/.ansible/cp"
state: absent

Related

Executing task after being logged as root in ansible

I am trying to subsequently run a task after I am connected using ssh. I am connecting using this in my playbook
- name: connect using password # task 1; this task set/connect me as root
expect:
command: ssh -o "StrictHostKeyChecking=no" myuser#********
responses:
"password:":
-my password
-my password
delegate_to: localhost
That task is fine and I am able to see that I am connected. The problem now is that when I try to run subsequent tasks for example:
- name: copy folder # task 2 in the same playbook
copy:
src: "files/mylocalfile.txt"
dest: "etc/temp"
mode: "0777"
I have the following message:
"msg: etc/temp not writable"
How do I do to continue executing the remaining task as root that got connected in task1?
I believe this might not be an ansible question, but a linux one.
Is your user in /etc/wheel?
Ansible has the direective become, which will let you execute a task as root, if the user you are connecting with is allowed to escalate privileges. The task you want to run with privileges would be something like:
- name: copy folder # task 2 in the same playbook
become: yes
copy:
src: "files/mylocalfile.txt"
dest: "etc/temp"
mode: "0777"
you can use become_user if you need to specify the user you want to run the task as, and if you have a password for the privileged user, you can ask ansible to prompt for the password when running ansible-playbook, using --ask-become-password.
The following link offers documentation about privilege escalation in ansible:
https://docs.ansible.com/ansible/latest/user_guide/become.html

Ansible is getting stuck on some server which is in bad ps state?

I have an ansible playbook as shown below and it works fine most of the times. But recently what I am noticing is it is getting stuck on some of the servers from the ALL group and just sits there. It doesn't even move forward to other servers in the ALL list.
# This will copy files
---
- hosts: ALL
serial: "{{ num_serial }}"
tasks:
- name: copy files
shell: "(ssh -o StrictHostKeyChecking=no abc.com 'ls -1 /var/lib/jenkins/workspace/copy/stuff/*' | parallel -j20 'scp -o StrictHostKeyChecking=no abc.com:{} /data/records/')"
- name: sleep for 5 sec
pause: seconds=5
So when I started debugging, I noticed on the actual server it is getting stuck - I can ssh (login) fine but when I run ps command then it just hangs and I don't get my cursor back so that means ansible is also getting stuck executing above scp command on that server.
So my question is even if I have some server in that state, why not just Ansible times out and move to other server? IS there anything we can do here so that ansible doesn't pause everything just waiting for that server to respond.
Note server is up and running and I can ssh fine but when we run ps command it just hangs and because of that Ansible is also hanging.
Is there any way to run this command ps aux | grep app on all the servers in ALL group and make a list of all the servers which executed this command fine (and if gets hang on some server then time out and move to other server in ALL list) and then pass on that list to work with my above ansible playbook? Can we do all this in one playbook?
Ansible doesn't have this feature and it might even be dangerous to have it. My suggestion in this case would be: see the failure, rebuild the server, run again.
It's possible to to build the feature you want in your playbook, what you could do is to have a dummy async task that triggers the issue, and the verify the outcome of that. If the async task didn't finish in a reasonable time, use the meta: end_host task to move to the next host.
You might need to mark some of those tasks with ignore_errors: yes.
Sorry that I cannot give you a complete answer as I've never tried to do this.
You can use strategies to achieve your goal. By default:
Plays run with a linear strategy, in which all hosts will run each
task before any host starts the next task
By using the free strategy, each host will run until the end of the play as fast as it can. For example:
---
- hosts: ALL
strategy: free
tasks:
- name: copy files
shell: "(ssh -o StrictHostKeyChecking=no abc.com 'ls -1 /var/lib/jenkins/workspace/copy/stuff/*' | parallel -j20 'scp -o StrictHostKeyChecking=no abc.com:{} /data/records/')"
- name: sleep for 5 sec
pause: seconds=5
Another option would be to use timeout to run your command, then using registers to process whether the command executed successfully or not. For example timeout 5 sleep 10 returns 124 because of the timeout while timeout 5 sleep 3 returns 0 because the command terminates before the timeout occurs. In an ansible script, you could use something like:
tasks:
- shell: timeout 5 ps aux | grep app
register: result
ignore_errors: True
- debug:
msg: timeout occured
when: result.rc == 124
As told by"Alassane Ndiaye ", you can try below code snippet.
Where I am giving condition when shell is not timeout
tasks:
- shell: timeout 5 ps aux | grep app
register: result
ignore_errors: True
- name: Run your shell command
shell: "(ssh -o StrictHostKeyChecking=no abc.com 'ls -1 /var/lib/jenkins/workspace/copy/stuff/*' | parallel -j20 'scp -o StrictHostKeyChecking=no abc.com:{} /data/records/')"
when: result.rc != 124 && result.rc != 0

How to switch out of root acount during server set up?

I need to automate the deployment of some remote Debian servers. These servers come with only the root account. I wish to make it such that the only time I ever need to login as root is during the set up process. Subsequent logins will be done using a regular user account, which will be created during the set up process.
However during the set up process, I need to set PermitRootLogin no and PasswordAuthentication no in /etc/ssh/sshd_config. Then I will be doing a service sshd restart. This will stop the ssh connection because ansible had logged into the server using the root account.
My question is: How do I make ansible ssh into the root account, create a regular user account, set PermitRootLogin no and PasswordAuthentication no, then ssh into the server using the regular user account and do the remaining set up tasks?
It is entirely possible that my set-up process is flawed. I will appreciate suggestions.
You can actually manage the entire setup process with Ansible, without requiring manual configuration prerequisites.
Interestingly, you can change ansible_user and ansible_password on the fly, using set_fact. Remaining tasks executed after set_fact will be executed using the new credentials:
- name: "Switch remote user on the fly"
hosts: my_new_hosts
vars:
reg_ansible_user: "regular_user"
reg_ansible_password: "secret_pw"
gather_facts: false
become: false
tasks:
- name: "(before) check login user"
command: whoami
register: result_before
- debug: msg="(before) whoami={{ result_before.stdout }}"
- name: "change ansible_user and ansible_password"
set_fact:
ansible_user: "{{ reg_ansible_user }}"
ansible_password: "{{ reg_ansible_password }}"
- name: "(after) check login user"
command: whoami
register: result_after
- debug: msg="(after) whoami={{ result_after.stdout }}"
Furthermore, you don't have to fully restart sshd to cause configuration changes to take effect, and existing SSH connections will stay open. Per sshd(8) manpage:
sshd rereads its configuration file when it receives a hangup signal, SIGHUP....
So, your setup playbook could be something like:
login initially with the root account
create the regular user and set his password or configure authorized_keys
configure sudoers to allow regular user to execute commands as root
use set_fact to switch to that account for the rest of the playbook (remember to use become: true on tasks after this one, since you have switched from root to regular user. you might even try executing a test sudo command before locking root out)
change sshd configuration
execute kill -HUP<sshd_pid>
verify by setting ansible_user back to root, fail if login works
You probably just want to make a standard user account and add it to sudoers. You could then run ansible with the standard user and if you need a command to run as root, you just prefix with command with sudo.
I wrote an article about setting up a deploy user
http://www.altmake.com/2013/03/06/secure-lamp-setup-on-amazon-linux-ami/

How communicate two remote machine through ansible

I am running ansible playbook from system 1 which runs tasks on system 2 to take backup and after that, I want to copy backup file from system 2 to system 3.
I am doing this task for automating below command
where /bck1/test on system 2 and opt/backup on system 3
rsync -r -v -e ssh /bck1/test.* root#host3:/opt/backup
You can run the raw rsync command with the shell module.
tasks:
- shell: rsync -r -v -e ssh /bck1/test.* root#host3:/opt/backup
For this to work, you will either need to have your private ssh key deployed to system 2, or, preferable enable ssh agent forwarding, for example in your .ssh/config:
Host host2
ForwardAgent yes
Additionally sshd on system 2 would need to accept agent forwarding. Here are some tasks which I use to do this:
- name: Ensure sshd allows agent forwarding
lineinfile: dest=/etc/ssh/sshd_config
regexp=^#?AllowAgentForwarding
line="AllowAgentForwarding yes"
follow=yes
backup=yes
sudo: yes
register: changed_sshd_config
- name: "Debian: Restart sshd"
shell: invoke-rc.d ssh restart
sudo: yes
when:
- ansible_distribution in [ "Debian", "Ubuntu" ]
- changed_sshd_config | changed
- name: "CentOS 7: Restart sshd"
shell: systemctl restart sshd.service
sudo: yes
when:
- ansible_distribution == "CentOS"
- ansible_distribution_major_version == "7"
- changed_sshd_config | changed
There are two separate tasks for restarting sshd on Debian and CentOS7. Pick what you need or maybe you have to adapt that to your system.
You might need to configure this in a separate playbook. Because Ansible will keep an open ssh connection to the host and after activating agent forwarding you most probably will need to re-connect.
PS: It's not the best idea to allow ssh login for user root, but that is another topic. :)

Ansible wait_for for connecting machine to actually login

In my working environment, virtual machines are created and after creating login access information is added to them and there can be delays so just waiting for my ansible script to check if SSH is available is not enough, I actually need to check if ansible can get inside the remote machine via ssh.
Here is my old script which fails me:
- name: wait for instances to listen on port:22
wait_for:
state: started
host: "{{ item }}"
port: 22
with_items: myservers
How can I rewrite this task snippet to achieve waiting for the localmachine can ssh into the remote machines (again not only checking if ssh is ready at the remote but it can actually authenticate to it).
This is somewhat ugly, but given your needs it might work:
- local_action: command ssh myuser#{{ ansible_inventory_hostname }} exit
register: log_output
until: log_output.stdout.find("Last login") > -1
retries: 10
delay: 5
The first line would cause your ansible host to try to ssh into the target host and immediately issue an "exit" to return control back to ansible. Any output from that command gets stored in the log_output variable. The until clause will check the output for the string 'Last login' (you may want to change this to something else depending on your environment), and Ansible will retry this task up to 10 times with a 5 second delay between attempts.
Bruce P's answer was close to what I needed, but my ssh doesn't print any banner when running a command, so checking stdout is problematic.
- local_action: command ssh "{{ hostname }}" exit
register: ssh_test
until: ssh_test.rc == 0
retries: 25
delay: 5
So instead I use the return code to check for success
As long as your Ansible user is already installed on the image you are using to create the new server instance, the wait_for command works well.
If that is not the case, then you need to poll the system that adds that user to the newly created instance for when you should continue - of course that system will have to have something to poll against...
The (very ugly) alternative is to put a static pause in your script that will wait the appropriate amount of time between the instance being created and the user being added like so:
- pause: seconds=1
Try not to though, static pauses are a bad way of solving this issue.

Resources