Puppet - How to only run 'apt-get update' if a package needs to be installed or updated - vagrant

I can't seem to figure out how to get Puppet to not run 'apt-get update' during every run.
The standard yet inefficient way:
The way I've been doing this is with the main Puppet manifest having:
exec { 'apt-get update':
path => '/usr/bin',
}
Then each subsequent module that needs a package installed has:
package { 'nginx':
ensure => 'present',
require => Exec['apt-get update'],
}
The problem with this is that, every time Puppet runs, Apt gets updated. This puts unnecessary load on our systems and network.
The solution I tried, but fails:
I looked in the Puppet docs and read about subscribe and refreshonly.
Refresh: exec resources can respond to refresh events (via notify, subscribe, or the ~> arrow). The refresh behavior of execs is non-standard, and can be affected by the refresh and refreshonly attributes:
If refreshonly is set to true, the exec will only run when it receives an event. This is the most reliable way to use refresh with execs.
subscribe
One or more resources that this resource depends on, expressed as resource references. Multiple resources can be specified as an array of references. When this attribute is present:
The subscribed resource(s) will be applied before this resource.
so I tried this in the main Puppet manifest:
# Setup this exec type to be used later.
# Only gets run when needed via "subscribe" calls when installing packages.
exec { 'apt-get update':
path => '/usr/bin',
refreshonly => true,
}
Then this in the module manifests:
# Ensure that Nginx is installed.
package { 'nginx':
ensure => 'present',
subscribe => Exec['apt-get update'],
}
But this fails because apt-get update doesn't get run before installing Nginx, so Apt can't find it.
Surely this is something others have encountered? What's the best way to solve this?

Puppet has a hard time coping with this scenario, because all resources are synchronized in a specific order. For each resource Puppet determines whether it needs a sync, and then acts accordingly, all in one step.
What you would need is a way to implement this process:
check if resource A (a package, say) needs a sync action (e.g., needs installing)
if so, trigger an action on resource B first (the exec for apt-get update)
once that is finished, perform the operation on resource A
And while it would be most helpful if there was such a feature, there currently is not.
It is usually the best approach to try and determine the necessity of apt-get update from changes to the configuration (new repositories added, new keys installed etc.). Changes to apt's configuration can then notify the apt-get upate resource. All packages can safely require this resource.
For the regular refreshing of the database, it is easier to rely on a daily cronjob or similar.

I run 'apt-get update' in a cron script on a daily basis, under the assumption that I don't care if it takes up to 24 hours to update OS packages via apt. Thus...
file { "/etc/cron.daily/updates":
source => "puppet:///modules/myprog/updates",
mode => 755
}
Where /etc/cron.daily/updates is, of course:
#!/bin/sh
apt-get -y update
Then for the applications, I just tell puppet something like:
# Ensure that Nginx is installed.
package { 'nginx':
ensure => latest
}
And done, once apt-get update runs, nginx will get updated to the latest version within the next twenty minutes or so (the next time puppet runs its recipe). Note that this requires you to have done 'apt-get update' in the initial image via whatever process you used to install puppet into the image (for example, if this is in CloudFormation, via the UserData section of the LaunchConfiguration). That is a reasonable requirement, IMHO.
If you want to do 'apt-get update' more often, you'll need to put a cron script into /etc/cron.d with the times you want to run it. I plopped it into cron.daily because that was often enough for me.

This is what you need to do - create an apt-get wrapper that would do apt-get update followed by calling a real apt-get (/usr/bin/apt-get) for install. Install the wrapper into a directory that will be in a PATH before apt-get.
Modify /usr/lib/ruby/vendor_ruby/puppet/provider/package/apt.rb and locate the line:
commands :aptget => "/usr/bin/apt-get"
( it will be right below has_features :versionenable, :install_options )
replace that line with:
commands :aptget => "apt-get"
You're done. For some boneheaded reason puppet insists on calling commands with absolute path rather than using a sane PATH variable.

Related

Puppet: generate statement fails when trying to retrieve default path of an executable

I have built a stanza to remove a ruby gem package from our servers. The problem is that the ruby gem executable is installed in different paths on the servers, so on one server it could be in /opt/ruby/bin/gem on other servers it's in /usr/local/rvm/rubies/ruby-2.0.0-p353/bin/gem
My stanza uses the generate function in puppet to pull out the default ruby gem installation as follows:
$ruby_gem_location = generate('which', 'gem')
exec { "remove-remote_syslog":
command => "gem uninstall remote_syslog",
path => "$ruby_gem_location:/opt/ruby/bin:/usr/bin:/usr/sbin",
onlyif => "$ruby_gem_location list|grep remote_syslog"
}
When I run puppet agent I get the following error:
Generators must be fully qualified at ****redacted*
I have also tried to provide a default path for the which command as follows:
$ruby_gem_location = generate('/usr/bin/which', 'gem')
and now the error says : Could not evaluate: Could not find command '/usr/bin/gem
I checked the target server and the gem command is in
/usr/local/rvm/rubies/ruby-2.0.0-p353/bin/gem
What am I doing wrong?
How can I pull out the default ruby gem location on our servers?
Thank you in advance
Your code
$ruby_gem_location = generate('/usr/bin/which', 'gem')
will generate a full path to your gem command (if it succeeds). From the result you describe, I think it is generating '/usr/bin/gem', which is perhaps a symlink to the real gem command. You are putting that into your command path instead of just the directory part, and that will not be helpful. It is not, however, the source of the error message you report.
The real problem here is that generate(), like all DSL fucntions, runs during catalog building. I infer from your results that you are using a master / agent setup, so generate() is giving you a full path to gem -- evidently /usr/bin/gem -- on the master. Since the whole point is that different servers have gem installed in different places, this is unhelpful. The actual error message arises from an attempt to execute your onlyif command with the wrong path to gem.
Your best way forward is probably to create a custom fact with which each node can report the appropriate location of the gem binary. You can then use that fact's value in your Exec, maybe:
exec { "remove-remote_syslog":
command => "$::ruby_gem_path uninstall remote_syslog",
onlyif => "$::ruby_gem_path list | grep remote_syslog"
}
Note that you don't need a path attribute if you give a complete path to the executable in the first place.
Details on creating the $::ruby_gem_path custom fact depend on a number of factors, and in their full generality they are rather too broad for SO, but PL provides good documentation.

Dependency loop during Pupppet provisioning due missing OS package

Im trying to provision my development server using Vagrant and Puppet. Below is some of my Puppet Manifest at this point. The issue im having is that im ending up in a dependency loop which is ofcourse correct. The only problem is that i dont see a way to do it without so therefor i need some help.
Im using the latest version of the box provided by Puppetlabs named puppetlabs/ubuntu-14.04-64-puppet. While adding a PPA to the package manager i receive an error that apt-add-repository is not available. Therefor you need to install the software-properties-common package.
The only problem is that before installing this package, you need to run apt-get update. The second problem is that the manifest wont accept it and it will try to add the PPA before so that, ofcourse which is a logic conclusion, it only has to update the package manager once. But by picking this last solution i will end up in a loop which triggers an error:
==> default: Error: Failed to apply catalog: Found 1 dependency cycle:
==> default: (Exec[add-apt-repository-ppa:ondrej/php-7.0] => Class[Apt::Update] => Exec[apt_update] => Class[Apt::Update] =>
Package[git] => Class[Systempackages] => Apt::Ppa[ppa:ondrej/php-7.0]
=> Exec[add-apt-repository-ppa:ondrej/php-7.0])
class systempackages {
package { [ 'git', 'curl', 'acl', 'unattended-upgrades', 'vim', 'software-properties-common']:
ensure => "installed",
require => [
Class['apt::update'],
],
}
}
/*===========================================*/
## System
Exec { path => [ "/bin/", "/sbin/" , "/usr/bin/", "/usr/sbin/" ] }
class{'systempackages':}
# APT
class { 'apt':
update => {
frequency => 'always',
},
}
apt::ppa { 'ppa:ondrej/php-7.0':
before => Package['php7.0-cli'],
require => Class['systempackages'],
}
# PHP
package {'php7.0-cli':
ensure => 'installed',
}
Given that this is on vagrant, I suggest installing package software-properties-common manually as part of your Vagrantfile.
Something like config.vm.provision "shell", inline: "apt-get update && apt-get install software-properties-common should work.
The circular dependency reflects the fact that Puppet is not a provisioning system. It can be used by a provisioning system or in conjunction with one, but it depends on a fairly substantial software stack being available before it can get off the ground. If Package 'software-properties-common' is necessary for full functioning of the Apt subsystem, then your best bet is to rely on your provisioning system to install it, so that it is available before Puppet ever runs, and to avoid declaring any relationship between that package and the classes and resources of the Apt module.
You are also impacted by the puppetlabs-apt module being quite good about declaring the relationships needed to ensure proper order of application. This is a double-edged sword, however: people cause themselves trouble with surprising frequency by declaring their own relationships with classes or defined types from that module that conflict with the ones it declares itself. In particular, it is asking for trouble to have your Apt::ppa resource require a class containing resources that themselves require any class or resource from the Apt module.
In any case, class apt::update is not a public class of the module. The main implication is that code outside the module should not reference it in any way. You should instead rely on the value you provided for class parameter $apt::update to instruct Puppet to perform an apt-get update at a suitable time.

Puppet - unable to execute ONLY ONCE ordered chain of Exec commands after notification

TLDR:
I can't configure ordered chain of Puppet "Exec" commands to run ONLY ONCE.
Details:
I want to use Vagrant and Puppet modules to setup VM with installed Redmine and some sample data loaded into it.
I'm using https://forge.puppetlabs.com/johanek/redmine and it works great - Redmine is installed and it works.
My goal:
Now I want to load sample data into Redmine using REST API:
Create 1 test project
Import 2 issues into this project
I want to run 2 simple "Exec", one after another and ONLY ONCE, but I can't achieve this, hence the question.
My current effort:
I've tried to subscribe to one of latest steps in redmine installation
subscribe => [Exec['rails_migrations']]
and then import data, but the first step "create-project1" always notifies second step "import-issues", so it creates duplicated data.
And if run vagrant provision few times, the "import-issues" creates duplicates of this issues.
Here is my code:
exec {'create-project1':
subscribe => [Exec['rails_migrations']],
path => ['/usr/bin', '/usr/sbin', '/bin'],
creates => "$redmine_install_dir/.data_loaded",
command => "curl WHICH_CREATES_PROJECT && touch $redmine_install_dir/.data_loaded",
notify => [Exec['import-issues']],
} ->
exec {'import-issues':
path => ['/usr/bin', '/usr/sbin', '/bin'],
command => "curl WHICH_IMPORTS_ISSUES",
refreshonly => true,
}
Question:
How to configure those Exec commands to run in chain and ONLY ONCE?
Im also thinking about extending this chain to 5 commands in near future, so keep that in mind.
you were almost there with 'ONLY ONCE' - Puppet has onlyif properties that you can include in your exec block to test if a file already exists or not.
you could then do something like
exec {'create-project1':
subscribe => [Exec['rails_migrations']],
path => ['/usr/bin', '/usr/sbin', '/bin'],
onlyif => "test ! -f $redmine_install_dir/.data_loaded"
command => "curl WHICH_CREATES_PROJECT && touch $redmine_install_dir/.data_loaded",
notify => [Exec['import-issues']],
which test on the existence of the $redmine_install_dir/.data_loaded- you should be able to play a bit with that to achieve what you want

Run command after gem install from gem root folder

I'm deploying a Sinatra app as a gem. I have a command that starts the app as a service.
We are using chef to manage our deployments.
How can I run the command to start the app service but only after it's fully installed (including run-time dependencies)?
I've tried Googling for trying to run a post-install script but I haven't found anything that is of use or concrete without some complicated 'extconf.rb' work around
I would prefer not to use an execute resource if I can help it.
EDIT: I tried what was suggested but it breaks thins in way that causes berkshelf not to work in our pipeline.
Here's the code I'm using:
execute "run-service:post_install" do
cwd (f = File.expand_path(__FILE__).split('/')).shift(f.length - 3).join('\\')
timeout 5
command "bundle && rake service:post_install"
# action :nothing
# subscribes :run, "gem_package[gem_name]" , :delayed
end
It doesn't matter if I un-comment or not the last two lines, it just breaks things but if i take out the whole thing it stops breaking things. Obviously I'm doing something wrong but I'm not sure what.
EDIT:
IT's the command itself that breaks it, when I change command to ls and action to :run, it breaks.
EDIT:after changing the command path around a bit I managed to get it to spit out a usable error, it was trying to run the command from chef cook books path, so I've (hopefully) forced it to use the correct path.
Why do you not want to use an execute resource? That is exactly what it is for, running commands from Chef. Chef obeys the order of the resources, so if you have a gem_package followed by an execute they will run in that order.
So, In the end I decided to try using the service resource because it allows you to set start, and stop commands.
The code that I used is :
service service_name do
init_command ("#{%x(gem env gemdir).strip.gsub('/','\\')}\\gems\\gem_name-#{installing_version}")
start_command "rake service:start"
stop_command "rake service:stop"
reload_command "rake service:reload"
restart_command "rake service:restart"
supports start: true, restart: true, reload: true
action [:enable,:start]
end
I'm still having problems but this is of a different sort.

Make chef cookbook recipe only run once

So I use the following recipe:
include_recipe "build-essential"
node_packages = value_for_platform(
[ "debian", "ubuntu" ] => { "default" => [ "libssl-dev" ] },
[ "amazon", "centos", "fedora", "centos" ] => { "default" => [ "openssl-devel" ] },
"default" => [ "libssl-dev" ]
)
node_packages.each do |node_package|
package node_package do
action :install
end
end
bash "install-node" do
cwd Chef::Config[:file_cache_path]
code <<-EOH
tar -xzf node-v#{node["nodejs"]["version"]}.tar.gz
(cd node-v#{node["nodejs"]["version"]} && ./configure --prefix=#{node["nodejs"]["dir"]} && make && make install)
EOH
action :nothing
not_if "#{node["nodejs"]["dir"]}/bin/node --version 2>&1 | grep #{node["nodejs"]["version"]}"
end
remote_file "#{Chef::Config[:file_cache_path]}/node-v#{node["nodejs"]["version"]}.tar.gz" do
source node["nodejs"]["url"]
checksum node["nodejs"]["checksum"]
notifies :run, resources(:bash => "install-node"), :immediately
end
It successfully installed nodejs on my Vagrant VM but on restart it's getting executed again. How do I prevent this? I'm not that good in reading ruby code.
To make the remote_file resource idempotent (i.e. to not download a file already present again) you have to correctly specify the checksum of the file. You do this in your code using the node["nodejs"]["checksum"] attribute. However, this only works, if the checksum is correctly specified as the SHA256 hash of the downloaded file, no other algorithm (esp. not MD5) is supported.
If the checksum is not correct, your recipe will still work. However, on the next run, Chef will notice that the checksum of the existing file is different from the one you specified and will download the file again, thus notify the install node ressource and do the whole compile stuff.
With chef, it's important that recipes be idempotent. That means that they should be able to run over and over again without changing the outcome. Chef expects to be able to run all the recipes on a node periodically, and that should be ok.
Do you have a way of knowing which resource within that recipe is causing you problems? The remote_file one is the only one I'm suspicious of being non-idempotent, but I'm not sure offhand.
Looking at the Chef wiki, I find this:
Deprecated Behavior In Chef 0.8.x and earlier, Remote File is also
used to fetch files from the files/ directory in a cookbook. This
behavior is now provided by #Cookbook File, and use of Remote File for
this purpose is deprecated (though still valid) in Chef 0.9.0 and
later.
Anyway, the way chef tends to work, it will look to see if whatever "#{Chef::Config[:file_cache_path]}/node-v#{node["nodejs"]["version"]}.tar.gz" resolves to exists, and if it does, it should skip that resource. Is it possible that install-node deletes that file when it's finished installing? If so, chef will re-fetch it every time.
You can run a recipe only once overriding the run-list with -o modifier.
sudo chef-client -o "recipe[cookbook::recipe]"
-o RunlistItem,RunlistItem..., Replace current run list with specified items
--override-runlist
In my experience remote_file always runs when executing chef-client, even if the target file already exists. I'm not sure why (haven't dug into the Chef code to find the exact cause of the bug), though.
You can always write a not_if or only_if to control the execution of the remote_file resource, but usually it's harmless to just let it run every time.
The rest of your code looks like it's already idempotent, so there's no harm in running the client repeatedly.
There's an action you can specify for remote_file that will make it run conditionally:
remote_file 'target' do
source 'wherever'
action :create_if_missing
end
See the docs.
If you want to test whether your recipe is idempotent, you may be interested in ToASTER, a framework for systematic testing of Chef scripts.
http://cloud-toaster.github.io/
Chef recipes are executed with different configurations in isolated container environments (Docker VMs), and ToASTER reports various metrics such as system state changes, convergence properties, and idempotence issues.

Resources