How do detect that cloud-init completed initialization - provisioning

I am configuring an OpenStack box using cloud-init/cloud-config. I intend to wait until it is fully configured before I start using it.
This is not all that hard to do using some marker file or detecting if the cloud-init process is still running, though it seems quite cumbersome to do that in every cloud-init script. Is there some recommended way? Natively supported by cloud-init, ideally?

The following command does the trick:
cloud-init status --wait
From https://ubuntu.com/blog/cloud-init-v-18-2-cli-subcommands:
cloud-init status gives simple human-readable or programmatic output
for what cloud-init is doing and whether it has finished successfully.
It can be used as a sanity check on a machine or in scripts to block
until cloud-init has completed successfully.

Another alternative is to let the cloud-init phone home once it finishes:
phone_home:
url: http://example.com/$INSTANCE_ID/
post:
- pub_key_dsa
- instance_id
- fqdn
tries: 10

As #flyxiao pointed out, cloud-init put status information into a dedicated directory on a filesystem: /run/cloud-init/ (preferred over /var/lib/cloud/data/ as it is guaranteed to describe last init process). status.json contains detailed about all init phases and result.json denotes the whole init is completed. The project documentation suggest a python script to detect cloud-init completion:
fin = "/run/cloud-init/result.json"
if os.path.exists(fin):
ret = json.load(open(fin, "r"))
if len(ret['v1']['errors']):
print "Finished with errors:" + "\n".join(ret['v1']['errors'])
else:
print "Finished no errors"
else:
print "Not Finished"

The simplest answer is to set a tag on the instance so you can poll for its existence.
If you have a Linux host, do this last:
aws ec2 create-tags --resources `ec2metadata --instance-id` --tags Key=BootstrapStatus,Value=complete
This avoids needing to set up a network endpoint, creating a point of failure, or SSHing in, creating a need to secure credentials.

You can check the /var/lib/cloud/data/status.json for cloud-init status.
Or if the host is using upstart, add one init process in /etc/init/newprocess.conf and newprocess.conf should be started after cloud-final.

Related

systemd - How can you find out whether the service was started on booting or by "systemctl restart ..."?

I wrote a service that has to behave differently, depending on whether it is started after a boot or by the command "systemctl restart ...".
Can I find out that in the daemon itself? Or alternatively set an environment variable in the "daemon.service" file for the daemon?
At the moment I don't see how this can be decided e.g. from the environment.
Thanks in advance,
Poldi
Sorry,
stupid question. I just have to write a temporary file to a ram disk ;-)

EC2 user-data not starting my application

I am using user-data of ec2 instance to power up my auto scale instances and run the application. I am running node js application.
But it is not working properly. I have debugged and checked the instance cloud monitor output. So it says
pm2 command not found
After reading and investigating a lot I have found that the path for the command as root is not there.
As EC2 user-data when it tries to run it finds the path
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
After ssh as ec2-user it is
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin
After ssh as sudo su it is
/root/.nvm/versions/node/v10.15.3/bin:/sbin:/bin:/usr/sbin:/usr/bin
It works only for the last path.
So what is the way or script to run the command as root during launch of the instance provided by user-data?
All thought to start your application with userdata is not recommended, because as per AWS documentation they are not assuring that instance will only come up after successful execution of user data. Even if user data failed it will spin up your instance.
For your problem, I assume if you give the complete absolute path of the binary, It will work.
/root/.nvm/versions/node/v10.15.3/bin/pm2
Better solution for this approach, create a service file for your application startup and start application with systemd or service.

Terraform, how to ignore failed servers while provisioning?

I am running terraform with a private openstack cloud to bootstrap new servers. When I try to create new servers (using any method) during the busiest times of operation (weekdays in the afternoon) usually half of the servers fail (and this has nothing to do with terraform). The issue is when one of the servers I try to provision fails to complete a provisioner "remote-exec" block without errors (because of my private cloud) my whole terraform apply stops.
I want terraform to totally ignore these failed servers when I run terraform apply so that if I try to provision 20 server and only 1 of them launches successfully, then that one server will run through all the commands I specify in my resource block.
Is there something like an ignore_failed_resources = true line I can add to my resources so that terraform will ignore the servers that fail and run the successful ones to completion?
There's no simple config switch that you can enable to achieve this. Could you be more specific about the problem that's causing the "remote-exec" to fail?
If it's because it's simply refusing to connect, you could switch out your "remote-exec" for a "local-exec" and wrap your command in a script, passing in the hostname of your server as a parameter. The script would then handle initiating the SSH connection and running the required commands. Make sure the script fails gracefully with an exit code of 0 if the error occurs, so that terraform will think that the script worked correctly.
resource "aws_instance" "web" {
provisioner "local-exec" {
command = "./myremotecommand.sh ${aws_instance.web.private_ip}"
}
}
I'm guessing you already figured out a solution, but adding my solution here for anyone who will encounter this in the future.
Why not simply do in the command part this?
provisioner "local-exec" {
command = "my command || true"
}
This way, it'll always return code 0, so if the shell ignore the failure, terraform will ignore it as well.
If you look at terraform local provisioner doc
you can put on_failure = continue
So it will continue at failure.

vagrant / puppet init.d script reports start when no start occurred

So, struggling with a fairly major problem, i've tried multiple different workarounds to try and get this working but there is something happening between puppet and the actual server that is just boggling my mind.
Basically, I have an init.d script /etc/init.d/rserve which is copied over correctly and when used from the command-line on the server works perfectly (i.e. sudo service rserve start|stop|status), the service returns correct error codes based on testing using echo $? on the different commands.
The puppet service statement is as follows:
service { 'rserve':
ensure => running,
enable => true,
require => [File["/etc/init.d/rserve"], Package['r-base'], Exec['install-r-packages']]
}
When puppet hits this service, it runs it's status method, sees that it isn't running and sets it to running and presumably starts the service, the output from puppet is below:
==> twine: debug: /Schedule[weekly]: Skipping device resources because running on a host
==> twine: debug: /Schedule[puppet]: Skipping device resources because running on a host
==> twine: debug: Service[rserve](provider=upstart): Could not find rserve.conf in /etc/init
==> twine: debug: Service[rserve](provider=upstart): Could not find rserve.conf in /etc/init.d
==> twine: debug: Service[rserve](provider=upstart): Could not find rserve in /etc/init
==> twine: debug: Service[rserve](provider=upstart): Executing '/etc/init.d/rserve status'
==> twine: debug: Service[rserve](provider=upstart): Executing '/etc/init.d/rserve start'
==> twine: notice: /Stage[main]/Etl/Service[rserve]/ensure: ensure changed 'stopped' to 'running'
Now when I actually check for the service using sudo service rserve status or ps aux | grep Rserve the service is in fact NOT running and a quick sudo service rserve start shows the init.d script is working fine and starting rserve as the service starts and is visible with ps aux.
Is there something I'm missing here? I've even tried starting the service by creating a puppet Exec { "sudo service rserve start"} which still reports that it executed successfully but the service is still not running on the server.
tl;dr puppet says a service started when it hasn't and there's seemingly nothing wrong with the init.d script, its exit codes or otherwise.
Update 1
In the comments below you can see I tried isolating the service in it's own test.pp file and running it using puppet apply on the server with the same result.
Update 2
I've now tried creating an .sh file with the command to start Rserve using a separate vagrant provision and can finally see an error. However, the error is confusing as the error does not occur when simply running sudo service rserve start, something in the way that vagrant executes .sh commands, or the user it executes them under is causing an option to be removed from the command inside the init.d script when it's executed.
This error is R and Rserve specific but it is complaining about a missing flag --no-save needing to be passed to R when it is in fact present in the init.d script and being correctly passed when ssh'd into the vagrant box and using the init.d commands.
Update 3
I've managed to get the whole process working at this point, however, it's one of those situations where the steps to get it to work didn't really readily reveal any understanding of why the original problem existed. I'm going to replicate the broken version and see if I can figure out what exactly was going on using one of the methods mentioned in the comments so that I can potentially post an answer up that will help someone out later on. If anyone has insight into why this might have been happening feel free to answer in the meantime however. To clarify the situation a bit, here are some details:
The service's dependencies were installed correctly using puppet
The service used a script in /etc/init.d on ubuntu to start|stop the Rserve service
The software in question is R (r-base) and Rserve (a communication layer between other langs and R)
Running the command sudo service rserve start from the command-line worked as expected
The init.d script returned correct error codes
A service {} block was being used to start the service from puppet
Puppet reported starting the service when the service wasn't started
Adding a provision option to the Vagrantfile for an .sh file containing sudo service rserve start revealed that some arguments in the init.d were being ignored when run by vagrants provisioning but not by a user active on the shell.

Smartfoxserver 2X linux 64 running on EC2 via dotcloud - how to install?

I am currently trying to deploy smartfoxserver 2X on EC2 using dotcloud. I have been able to detect the private ip of the amazon web instance, and using the dotcloud tools I have been able to determine the correct port. However, I have difficulty installing the server proper via the command line so that I can log into it using the AdminTool.
My postinstall is fairly straightforward:
./SFS2X/sfs2x-service start-launchd
I find that on 'dotcloud push' there is a fair amount of promising output in my cygwin terminal, but the push hangs after saying that the sfs2x-service has been launched correctly, until timeout.
Consequently, my question is, has anyone found a way to install SFS2X on EC2 via dotcloud successfully? I managed to have partial success with SFS Pro, with a complete push to dotcloud, by calling ./jre/bin/java -jar installer.jar in my postinstall. Do I need to do extra legwork and build an installer jar for SFS2X? Is there a way that would be best to do this?
I do understand that there is a standard approach to deployment with SFS2X using RightScale on EC2, however I am interested in deployment using the dotcloud platform.
Thanks in advance.
The reason why it is hanging is because you are trying to start your process in the postinstall, and this is not the correct place to do that. The postinstall script is suppose to finish, if it doesn't the deployment will time out, and then get cancelled.
Once the postinstall script is finished, it will then finish the rest of your deployment.
See this page for more information about dotCloud postinstall script:
http://docs.dotcloud.com/0.9/guides/hooks/#post-install
Pay attention to this warning at the end.
Warning:
If your post-install script returns an error (non-zero exit code), or if it runs for more than 10 minutes, the platform will consider that your build has failed, and the new version of your code will not be deployed.
Instead of putting this in the postinstall script, you should add it as a background process, so that it starts up once the deployment process is complete.
See this page for more information on adding background processes to dotCloud services:
http://docs.dotcloud.com/0.9/guides/daemons/
TL;DR: You need to create a supervisord.conf file, and add it to the root of your project, and add your service to that.
Example (you will need to change to fit your situation):
[program:smartfoxserver]
command = /home/dotcloud/current/SFS2X/sfs2x-service start-launchd
Also, make sure you have the correct dotCloud service specified in your dotcloud.yml in order to have the correct binary and libraries installed for what your smartfoxserver application.

Resources