Terraform, how to ignore failed servers while provisioning? - provisioning

I am running terraform with a private openstack cloud to bootstrap new servers. When I try to create new servers (using any method) during the busiest times of operation (weekdays in the afternoon) usually half of the servers fail (and this has nothing to do with terraform). The issue is when one of the servers I try to provision fails to complete a provisioner "remote-exec" block without errors (because of my private cloud) my whole terraform apply stops.
I want terraform to totally ignore these failed servers when I run terraform apply so that if I try to provision 20 server and only 1 of them launches successfully, then that one server will run through all the commands I specify in my resource block.
Is there something like an ignore_failed_resources = true line I can add to my resources so that terraform will ignore the servers that fail and run the successful ones to completion?

There's no simple config switch that you can enable to achieve this. Could you be more specific about the problem that's causing the "remote-exec" to fail?
If it's because it's simply refusing to connect, you could switch out your "remote-exec" for a "local-exec" and wrap your command in a script, passing in the hostname of your server as a parameter. The script would then handle initiating the SSH connection and running the required commands. Make sure the script fails gracefully with an exit code of 0 if the error occurs, so that terraform will think that the script worked correctly.
resource "aws_instance" "web" {
provisioner "local-exec" {
command = "./myremotecommand.sh ${aws_instance.web.private_ip}"
}
}

I'm guessing you already figured out a solution, but adding my solution here for anyone who will encounter this in the future.
Why not simply do in the command part this?
provisioner "local-exec" {
command = "my command || true"
}
This way, it'll always return code 0, so if the shell ignore the failure, terraform will ignore it as well.

If you look at terraform local provisioner doc
you can put on_failure = continue
So it will continue at failure.

Related

How to fail Azure devops pipeline task specifically for failures in bash script

I am using Azure Devops pipeline and in that there is one task that will create KVM guest VM and once VM is created through packer inside the host it will run a bash script to check the status of services running inside the guest VM.
If any services are not running or thrown error then this bash script will exit with code 3 as i have added the value in bash script as below
set -e
So i want the task to fail if the above bash script fails, but issue is in the same task as KVM guest VM is getting created so while booting up and shutdown it throws expected errors but i dont want this task to fail due these error but to fail it only bash scripts fails.
i have selected the option in task "Fail on Standard Error"
But not sure how we can fail the task specifically for bash script error, can anyone have some suggestions on this?
You can try and use exit 1 command to have the bash task failed. And it is often a command you'll issue soon after an error is logged.
Additionally, you also may use logging commands to customized a error message. Kindly refer to the sample below.
#!/bin/bash
echo "##vso[task.logissue type=error]Something went very wrong."
exit 1

How to execute some automation scripts after provisioning resources via terraform

Consider this use-case please: As part of our test framework, we have to deploy some resources, then executing some script before we can start using the resource for testing. A typical example is the AirView RDS module. The RDS is often provisioned with flyway module, which has an SSM document for creating the DB. What we had been doing is call the RDS module and the flyway module, apply them in a terraform workspace. Once they are successfully deployed (i.e. applied), a human would need to go through the AWS console and execute the script that creates NGCS database (this is for example). After that it's ready to be used for testing. I would like to find a way to avoid this human interaction step. So the order of creation and actions should be:
Provision DB cluster
Provision utility EC2 instance (where the flyway script can run)
Execute flyway
How can that be done in an automated way? Further, if I have few resources that also need similar set up (may not be flyway, but some kind of scripts), how can I control the sequence of activities (from creating resources to running scripts on them)?
Try to use terraform provisioners. aws_instance resource, which I suppose you are using fully supports this feature. With provisioner, you can run any command you want just after instance creation.
Don't forget to apply connection settings. You can read more here and here
Finally, you should get something close to this one:
resource "aws_instance" "my_instance" {
ami = "${var.instance_ami}"
instance_type = "${var.instance_type}"
subnet_id = "${aws_subnet.my_subnet.id}"
vpc_security_group_ids = ["${aws_security_group.my_sg.id}"]
key_name = "${aws_key_pair.ec2key.key_name}"
provisioner "remote-exec" {
inline = [
"my commands",
]
}
connection {
type = "ssh"
user = "ec2-user"
password = ""
private_key = "${file("~/.ssh/id_rsa")}"
}
}
You need to remember that provisioner are last resort.
From the docs, have you tried user_data?

Ansible and Terraform Debuggin

I have an ansible playbook that calls a Terraform deployment, then provisions the deployment with another ansible call,
This gets to a certain point and hangs with limited output to the console... This will hang for a long time without knowing what is causing it to hang. Even when I invoke a -vvv or -vvvv flag
Is there anything I can do to enable a debug output to console to pinpoint what is causing the hang?
I presume the commands ran during the provision are not getting an exit 0 somehow.

How do detect that cloud-init completed initialization

I am configuring an OpenStack box using cloud-init/cloud-config. I intend to wait until it is fully configured before I start using it.
This is not all that hard to do using some marker file or detecting if the cloud-init process is still running, though it seems quite cumbersome to do that in every cloud-init script. Is there some recommended way? Natively supported by cloud-init, ideally?
The following command does the trick:
cloud-init status --wait
From https://ubuntu.com/blog/cloud-init-v-18-2-cli-subcommands:
cloud-init status gives simple human-readable or programmatic output
for what cloud-init is doing and whether it has finished successfully.
It can be used as a sanity check on a machine or in scripts to block
until cloud-init has completed successfully.
Another alternative is to let the cloud-init phone home once it finishes:
phone_home:
url: http://example.com/$INSTANCE_ID/
post:
- pub_key_dsa
- instance_id
- fqdn
tries: 10
As #flyxiao pointed out, cloud-init put status information into a dedicated directory on a filesystem: /run/cloud-init/ (preferred over /var/lib/cloud/data/ as it is guaranteed to describe last init process). status.json contains detailed about all init phases and result.json denotes the whole init is completed. The project documentation suggest a python script to detect cloud-init completion:
fin = "/run/cloud-init/result.json"
if os.path.exists(fin):
ret = json.load(open(fin, "r"))
if len(ret['v1']['errors']):
print "Finished with errors:" + "\n".join(ret['v1']['errors'])
else:
print "Finished no errors"
else:
print "Not Finished"
The simplest answer is to set a tag on the instance so you can poll for its existence.
If you have a Linux host, do this last:
aws ec2 create-tags --resources `ec2metadata --instance-id` --tags Key=BootstrapStatus,Value=complete
This avoids needing to set up a network endpoint, creating a point of failure, or SSHing in, creating a need to secure credentials.
You can check the /var/lib/cloud/data/status.json for cloud-init status.
Or if the host is using upstart, add one init process in /etc/init/newprocess.conf and newprocess.conf should be started after cloud-final.

Invoke a shell script execution using nagios

Hi all I am having a script which restarts all the components(.jar files) present in the server (/scripts/startAll.sh). So whenever my server goes down, I want to invoke the execution of the script using nagios, which is running on different linux server. is it possible to do so? kindly help on How to invoke execution of this script using nagios?
Event Handlers
Nagios and Naemon allow executing custom scripts, both for hosts and for services entering a 'problem state.' Since your implementation is for restarting specific applications, yours will most likely need to be service event handlers.
From Nagios Documentation:
Event handlers can be enabled or disabled on a program-wide basis by
using the enable_event_handlers in your main configuration file.
Host- and service-specific event handlers can be enabled or disabled
by using the event_handler_enabled directive in your host and service
definitions. Host- and service-specific event handlers will not be
executed if the global enable_event_handlers option is disabled.
Enabling and Creating Event Handler Commands for a Service or Host
First, enable event handlers by modifying or adding the following line to your Nagios config file.
[IE: /usr/local/nagios/etc/nagios.cfg]:
enable_event_handlers=1
Define and enable an event handler on the service failure(s) that will trigger the script. Do so by adding two event_handler directives inside of the service you've already defined.
[IE: /usr/local/nagios/etc/services.cfg]:
define service{
host_name my-server
service_description my-check
check_command my-check-command!arg1!arg2!etc
....
event_handler my-eventhandler
event_handler_enabled 1
}
The last step is to create the event_handler command named in step 2, and point it to a script you've already created. There are a few approaches to this (SSH, NRPE, Locally-Hosted, Remotely Hosted). I'll use the simplest method, hosting a BASH script on the monitor system that will connect via SSH and execute:
[IE: /usr/local/nagios/etc/objects/commands.cfg]:
define command{
command_name my-eventhandler
command_line /usr/local/nagios/libexec/eventhandlers/my-eventhandler.sh
}
In this example, the script "my-eventhandler.sh" should use SSH to connect to the remote system, and execute the commands you've decided on.
NOTE: This is only intended as a quick, working solution for one box in your environment. In practice, it is better to create an event handler script remotely, and to use an agent such as NRPE to execute the command while passing a $HOSTNAME$ variable (thus allowing the solution to scale across more than one system). The simplest tutorial I've found for using NRPE to execute an event handler can be found here.
You can run shell scripts on remote hosts by snmpd using check_by_snmp.pl
Take a view to https://exchange.nagios.org/directory/Plugins/*-Remote-Check-Tunneling/check_by_snmp--2F-check_snmp_extend--2F-check_snmp_exec/details
This is a very useful plugin for nagios. I work with this a lot.
Good luck!!

Resources