vagrant / puppet init.d script reports start when no start occurred - vagrant

So, struggling with a fairly major problem, i've tried multiple different workarounds to try and get this working but there is something happening between puppet and the actual server that is just boggling my mind.
Basically, I have an init.d script /etc/init.d/rserve which is copied over correctly and when used from the command-line on the server works perfectly (i.e. sudo service rserve start|stop|status), the service returns correct error codes based on testing using echo $? on the different commands.
The puppet service statement is as follows:
service { 'rserve':
ensure => running,
enable => true,
require => [File["/etc/init.d/rserve"], Package['r-base'], Exec['install-r-packages']]
}
When puppet hits this service, it runs it's status method, sees that it isn't running and sets it to running and presumably starts the service, the output from puppet is below:
==> twine: debug: /Schedule[weekly]: Skipping device resources because running on a host
==> twine: debug: /Schedule[puppet]: Skipping device resources because running on a host
==> twine: debug: Service[rserve](provider=upstart): Could not find rserve.conf in /etc/init
==> twine: debug: Service[rserve](provider=upstart): Could not find rserve.conf in /etc/init.d
==> twine: debug: Service[rserve](provider=upstart): Could not find rserve in /etc/init
==> twine: debug: Service[rserve](provider=upstart): Executing '/etc/init.d/rserve status'
==> twine: debug: Service[rserve](provider=upstart): Executing '/etc/init.d/rserve start'
==> twine: notice: /Stage[main]/Etl/Service[rserve]/ensure: ensure changed 'stopped' to 'running'
Now when I actually check for the service using sudo service rserve status or ps aux | grep Rserve the service is in fact NOT running and a quick sudo service rserve start shows the init.d script is working fine and starting rserve as the service starts and is visible with ps aux.
Is there something I'm missing here? I've even tried starting the service by creating a puppet Exec { "sudo service rserve start"} which still reports that it executed successfully but the service is still not running on the server.
tl;dr puppet says a service started when it hasn't and there's seemingly nothing wrong with the init.d script, its exit codes or otherwise.
Update 1
In the comments below you can see I tried isolating the service in it's own test.pp file and running it using puppet apply on the server with the same result.
Update 2
I've now tried creating an .sh file with the command to start Rserve using a separate vagrant provision and can finally see an error. However, the error is confusing as the error does not occur when simply running sudo service rserve start, something in the way that vagrant executes .sh commands, or the user it executes them under is causing an option to be removed from the command inside the init.d script when it's executed.
This error is R and Rserve specific but it is complaining about a missing flag --no-save needing to be passed to R when it is in fact present in the init.d script and being correctly passed when ssh'd into the vagrant box and using the init.d commands.
Update 3
I've managed to get the whole process working at this point, however, it's one of those situations where the steps to get it to work didn't really readily reveal any understanding of why the original problem existed. I'm going to replicate the broken version and see if I can figure out what exactly was going on using one of the methods mentioned in the comments so that I can potentially post an answer up that will help someone out later on. If anyone has insight into why this might have been happening feel free to answer in the meantime however. To clarify the situation a bit, here are some details:
The service's dependencies were installed correctly using puppet
The service used a script in /etc/init.d on ubuntu to start|stop the Rserve service
The software in question is R (r-base) and Rserve (a communication layer between other langs and R)
Running the command sudo service rserve start from the command-line worked as expected
The init.d script returned correct error codes
A service {} block was being used to start the service from puppet
Puppet reported starting the service when the service wasn't started
Adding a provision option to the Vagrantfile for an .sh file containing sudo service rserve start revealed that some arguments in the init.d were being ignored when run by vagrants provisioning but not by a user active on the shell.

Related

puppet is not creating agent_catalog_run.lock file

I have an ansible script that starts puppet agent and then waits for /var/lib/puppet/state/agent_catalog_run.lock file.
I found that this file is not getting created on target machine.
Ansible version is: 1.9.7 and puppet agent version is: 3.8.7
I checked on target linux machine and puppet agent is in running state.
Below is ansible script line:
- name: ensure that puppet lock file is created
wait_for:
path: /var/lib/puppet/state/agent_catalog_run.lock
timeout: 1800
What are things that should be checked in this scenario ?
(Note: No puppet logs have been created.)
The code is simply checking for the wrong file.
As the name suggests, the agent_catalog_run.lock is:
A lock file to indicate that a puppet agent catalog run is currently in progress. The file contains the pid of the process that holds the lock on the catalog run.
In other words, that file will only be there if a Puppet agent run is occurring.
You may want the pidfile instead, which is:
The file containing the PID of a running process. This file is intended to be used by service management frameworks and monitoring systems to determine if a puppet process is still in the process table.
Default: $rundir/${run_mode}.pid
(Where $run_mode would be "agent".)
Note that you can inspect your actual settings using puppet config print, e.g.:
▶ puppet config print pidfile
/Users/alexharvey/.puppetlabs/var/run/main.pid
Yours will be different because mine is running as the non root user on a Mac OS X laptop. Thus, I think you will need to change your code to:
- name: wait for the puppet PID file to be created
wait_for:
path: /var/run/agent.pid
timeout: 1800

Is there a CLI command to report if vagrant provisioning is complete?

While vagrant up is executing, any call to vagrant status will report that the machine is 'running', even if the provisioning is not yet complete.
Is there a simple command for asking whether the vagrant up call is done and the machine is fully-provisioned?
You could have your provision script write to a networked file and query that. Or you could vagrant ssh -c /check/for/something if there was a file or service to check agains. Your provision script could also ping out to a listener you set up.
You could also use the Vagrant log or debug output to check when provisioning is done.

Starting Bitbucket Server in Ansible

I'm using Vagrant and Ansible to create my Bitbucket Server on Ubuntu 15.10. I have the server setup complete and working but I have to manually run the start-webapp.sh script to start the server each time I reprovision the server.
I have the following task in my Bitbucket role in Ansible and when I increase the verbosity I can see that I get a positive response from the server saying it will be running at http://localhost/ but when I go to the URL the server isn't on. If I then SSH in to the server and run the script myself, getting the exact same response after running the script I can see the startup webpage.
- name: Start the Bitbucket Server
become: yes
shell: /bitbucket-server/atlassian-bitbucket-4.7.1/bin/start-webapp.sh
Any advice would be great on how to fix this.
Thanks,
Sam
Probably better to change that to an init script and use the service module to start it. For example, see this role for installing bitbucket...
Otherwise, you're subject to HUP and other issues from running processes under an ephemeral session.

how to unlock a vagrant machine while it is being provisioned

Our vagrant box takes ~1h to provision thus when vagrant up is run for the first time, at the very end of provisioning process I would like to package the box to an image in a local folder so it can be used as a base box next time it needs to be rebuilt. I'm using vagrant-triggers plugin to place the code right at the end of :up process.
Relevant (shortened) Vagrantfile:
pre_built_box_file_name = 'image.vagrant'
pre_built_box_path = 'file://' + File.join(Dir.pwd, pre_built_box_file_name)
pre_built_box_exists = File.file?(pre_built_box_path)
Vagrant.configure(2) do |config|
config.vm.box = 'ubuntu/trusty64'
config.vm.box_url = pre_built_box_path if pre_built_box_exists
config.trigger.after :up do
if not pre_built_box_exists
system("echo 'Building gett vagrant image for re-use...'; vagrant halt; vagrant package --output #{pre_built_box_file_name}; vagrant up;")
end
end
end
The problem is that vagrant locks the machine while the current (vagrant up) process is running:
An action 'halt' was attempted on the machine 'gett',
but another process is already executing an action on the machine.
Vagrant locks each machine for access by only one process at a time.
Please wait until the other Vagrant process finishes modifying this
machine, then try again.
I understand the dangers of two processes provisioning or modifying the machine at one given time, but this is a special case where I'm certain the provisioning has completed.
How can I manually "unlock" vagrant machine during provisioning so I can run vagrant halt; vagrant package; vagrant up; from within config.trigger.after :up?
Or is there at least a way to start vagrant up without locking the machine?
vagrant
This issue has been fixed in GH #3664 (2015). If this still happening, probably it's related to plugins (such as AWS). So try without plugins.
vagrant-aws
If you're using AWS, then follow this bug/feature report: #428 - Unable to ssh into instance during provisioning, which is currently pending.
However there is a pull request which fixes the issue:
Allow status and ssh to run without a lock #457
So apply the fix manually, or waits until it's fixed in the next release.
In case you've got this error related to machines which aren't valid, then try running the vagrant global-status --prune command.
Definitely a bit more of a hack than a solution, but I'd rather a hack than nothing.
I ran into this issue and nothing that was suggested here was working for me. Even though this is 6 years old, it's what came up on a google (along with precious little else), I thought I'd share what solved it for me in case anyone else lands here.
My Setup
I'm using vagrant with ansible-local provisioner on a local virtualbox VM, which provisions remote AWS EC2 instances. (i.e. the ansible-local runs on the virtualbox instance, vagrant provisions the virtualbox instance, ansible handles the cloud). This setup is largely because my host OS is Windows and it's a little easier to take Microsoft out of the equation on this one.
My Mistake
Ran an ansible shell task with a command that doesn't terminate without user input (and did not run it with the & to run in the background).
My Frustration
Even in the linux subsystem, trying a ps aux | grep ruby or ps aux | grep vagrant was unhelpful because the PID would change every time. Probably a reason for this, likely has something to do with how the subsystem works, but I don't know what that reason is.
My Solution
Just kill the AWS EC2 instances manually. In the console, in the CLI, pick your flavor. Your terminal where you were running vagrant provision or vagrant up should then finally complete and spit out the summary output, even if you ctrl + C'd out of the command.
Hoping this helps someone!

Puppet agent daemon not reading a facter fact (EC2, cloud-init)

I am using puppet to read a fact from facter, and based on that I apply a different configuration to my modules.
Problem:
the puppet agent isn't seeing this fact. Running puppet agent --test interactively works as expected. Even running it non-interactively from a script seems to work fine. Only the agent itself is screwing up.
Process:
I am deploying an Ubuntu-based app stack on EC2. Using userdata (#cloud-config), I set an environment variable in /etc/environment:
export FACTER_tl_role=development
then immediately in #cloud-config, i source /etc/environment.
only THEN i apt-get install puppet (i moved away from using package: puppet to eliminate ambiguity in the sequence of #cloud-config steps)
Once the instance boots, I confirm that the fact is available: running facter tl_role returns "development". I then check /var/log/syslog, and apparently the puppet agent is not seeing this fact - I know this because it's unable to compile the catalog, and there's nothing (blank) where I'm supposed to be seeing the value of the variable set depending on this fact.
However, running puppet agent --test interactively compiles and runs the catalog just fine.
even running this from the #cloud-config script (immediately after installing puppet) also works just fine.
How do I make this fact available to the puppet agent? Restarting the agent service makes no difference, it remains unaware of the custom fact. Rebooting the instance also makes no difference.
here's some code:
EC2 userdata:
#cloud-config
puppet:
conf:
agent:
server: "puppet.foo.bar"
certname: "%i.%f"
report: "true"
runcmd:
- sleep 20
- echo 'export FACTER_tl_role=development' >> /etc/environment
- . /etc/environment
- apt-get install puppet
- puppet agent --test
Main puppet manifest:
# /etc/puppet/manifests/site.pp
node default {
case $tl_role {
'development': { $sitedomain = "dev.foo.bar"}
'production': { $sitedomain = "new.foo.bar"}
}
class {"code" : sitedomain => $sitedomain}
class {"apache::site" : sitedomain => $sitedomain}
class {"nodejs::grunt-daemon" : sitedomain => $sitedomain}
And then I see failures where $sitedomain is supposed to be, so $tl_role appears to be not set.
Any ideas? This is exploding my brain....
Another easy option would be to drop a fact into an external fact.
Dropping a file into /etc/facter/facts.d/* is fairly easy, and you can use a text file, yaml json or an executable to do it.
http://docs.puppetlabs.com/guides/custom_facts.html#external-facts
*that's on Open source puppet, on unix-y machines. See the link for the full docs.
Thank you, #christopher. This may be a good solution, I will test it and possibly move to it from my current horrible hack.
The answer I got in the Puppet Users Google Group was that I should not assume that the Puppet agent process will have an environment of a login shell, and that Facter will also have this environment when it is run by the Puppet agent.
Here is the way I solved it (admittedly, by brute force):
runcmd:
- echo 'export FACTER_tl_role=development' >> /etc/environment
- . /etc/environment
- apt-get install puppet
- service puppet stop
- sed -i '/init-functions/a\. \/etc\/environment' /etc/init.d/puppet
- puppet agent --test
- service puppet start
As you can see, after installing Puppet, I stop the agent, and add a line to /etc/init.d/puppet to source /etc/environment. Then I start the agent. NOT ideal... but it works!
I don't think . /etc/environment is going to work properly the way cloud-init executes runcmd. Two possible solutions:
Export the variable with the puppet agent command:
export FACTER_tl_role=development && puppet agent --test
If that doesn't work:
Just drop the commands into a user-data script and wire them together as a "multipart input" (described in the cloud-init docs).
The second solution executes the commands as a proper shell script, and would likely fix the problem. If the first works, though, it's easier to do with what you have.

Resources