Chef-provisioning not picking up convergence options - ruby

Working with chef-provisioning to provision a set of Windows Server 2012 VMs using the following convergence_options.
convergence_options: {
chef_config: "ssl_verify_mode :verify_none", # String containing additional text to inject into client.rb
chef_version: '12.18.31',
install_msi_url: 'https://packages.chef.io/files/stable/chef/12.18.31/windows/2012r2/chef-client-12.18.31-1-x64.msi',
ignore_failure: [259, 35, 37]
}
Per documentation the ignore_failure property should ignore failures for the specified exit codes however it appears that the property is not having any effect at all.
Convergence failures on provisioned machines (from non-zero exit codes on reboot) are still stopping the entire provisioning operation.
================================================================================
Error executing action `converge` on resource 'machine[dvps01]'
================================================================================
RuntimeError
------------
Error: command '$env:path = [System.Environment]::GetEnvironmentVariable('PATH', 'MACHINE');chef-client -l auto' exited with code 259.
Any thoughts?

After some further investigation, simply passing ignore_failure as a convergence option does not seem to be sufficient.
I had to make sure that the driver.rb included:
require 'chef/provisioning/convergence_strategy/ignore_convergence_failure'
I also consulted tests in the chef-provisioning repo to see how they expected the ignore_failure functionality to work. In ignore_convergence_failure_spec.rb we see the following:
let(:test_class) do
t = TestConvergeClass.new(convergence_options, test_error)
t.extend(Chef::Provisioning::ConvergenceStrategy::IgnoreConvergenceFailure)
t
end
To that end, I added the following to convergence_strategy_for in driver.rb:
machine.extend(Chef::Provisioning::ConvergenceStrategy::IgnoreConvergenceFailure)
All of this resulted in the expected behavior of ignore_failure such that client converge failures were ignored.

Related

Chef: Why am I not reading in the attribute value I just set?

I am getting my toes wet with chef at my job and have been tasked with installing making a recipe to install telegraf on our machines with custom configs. Let me also preface this with I have no ruby experience.
Before downloading or installing telegraf I want to verify that the if telegraf exists to only do all the following work if the versions miss match.
So I have attempted to set an attribute during the recipe runtime that other resources will check against.
ruby_block 'get telegraf version' do
block do
#tricky way to load this Chef::Mixin::ShellOut utilities
Chef::Resource::RubyBlock.send(:include, Chef::Mixin::ShellOut)
command = 'C:\\Program Files\\telegraf\\telegraf.exe --version'
command_out = shell_out(command)
node.default['windows']['telegraf']['installed_version'] = 'good'
end
notifies :write, 'log[log_version]', :delayed
action :run
only_if { ::File.exists?('C:\\Program Files\\telegraf\\telegraf.exe')}
end
log 'log_version' do
message node['windows']['telegraf']['installed_version']
level :error
end
When I look at the output though I see
* ruby_block[get telegraf version] action run[2018-07-23T14:48:11-07:00] INFO: Processing ruby_block[get telegraf version] action run (win-telegraf::telegraf line 26)
[2018-07-23T14:48:11-07:00] INFO: ruby_block[get telegraf version] called
- execute the ruby block get telegraf version
* log[log_version] action write[2018-07-23T14:48:11-07:00] INFO: Processing log[log_version] action write (win-telegraf::telegraf line 39)
[2018-07-23T14:48:11-07:00] ERROR:
So why is it when I read node['windows']['telegraf']['installed_version'] that the log prints nothing instead of 'good'?
Chef uses a two-pass loading system, check out https://coderanger.net/two-pass/ for more details. But the tl;dr for this case is that the stuff inside block do ... end runs in the second phase, while the Ruby code for the log resource is evaluated in the first phase. In general you can fix this using the lazy{} helper, but in this case what you probably want is either a custom resource or an Ohai plugin. For "normal" Windows applications, this is all handled by the MSI subsystem and the windows_package resource, but as Telegraf doesn't offer MSI packages you are a bit out of luck. That said, there are packages available for Chocolatey (a Windows packaging system like Mac's Homebrew) so you might want to look into using that instead of writing this yourself.

Chef - ArgumentError: too short control escape

I will glad to get an any help in the next issue:
when I run a numerous recipes (when I run an each in a separate way it doesn't fails), I sometimes get a next error:
"ArgumentError: too short control escape"
log:
[2016-03-15T15:41:55+01:00] INFO: Running queued delayed notifications before re-raising exception
[2016-03-15T15:41:55+01:00] ERROR: Running exception handlers
[2016-03-15T15:41:55+01:00] ERROR: Exception handlers complete
[2016-03-15T15:41:55+01:00] FATAL: Stacktrace dumped to c:/chef/chef-stacktrace.out
[2016-03-15T15:41:55+01:00] FATAL: ArgumentError: too short control escape
chef-stacktrace.out:
Generated at 2016-03-14 15:56:29 +0100
ArgumentError: too short control escape
C:/opscode/chef/embedded/apps/chef/lib/chef/formatters/error_inspectors/resource_failure_inspector.rb:66:in 'recipe_snippet'
C:/opscode/chef/embedded/apps/chef/lib/chef/formatters/error_inspectors/resource_failure_inspector.rb:43:in 'add_explanation'
It happens randomly and I can't to find an explanation,
Thanks
I'm guessing something is going wonky with the regexp compile. It supposed to use Regexp.escape(source) but something might be slipping through? Please include the full error output though.
After a deep investigations, we have found the root cause of the issue. The name of the Github repository was interpreted by Chef as an escape character (the name of repository was starting with capital letter "C") which caused the configuration to fail alternately.
It regards to Chef 12.0.3 version (I hope, they fixed it in a newer next version)
We changed the name of repository and it solved the problem.

How to debug Errno::EIO error in Chef recipe using Chef::Provider::Git

I'm trying to use chef to check out a git repo to a windows client node.
This seems simple enough and I've got the following resource definition:
git "C:\\pathtocheckout" do
repo "https://gitserver/repo.git"
action [ :checkout, :sync]
end
But when this is reached by chef-client I get:
Errno::EIO: git[C:\pathtocheckout] (cookbook_name::test line 21) had an error: Errno::EIO: Input/output error - CreateProcessW
I've had a look at the stacktrace produced and it appears to be something to do with creating a process to run the git command - but this is the limit of my knowledge.
I've made sure git is installed on on Path, removed all other recipes from the run list, running as a different admin user and I've tried different repositories but all with the same error.
So I'm pretty stumped - anyone got a way I can dig into this error and see what is going on?

Chef - finding the missing attribute NilClass

Context - We have a massive amount of Chef attributes to perform our install, something like 3000+ have now been defined and change per environment.
Problem - Sometimes a Chef recipe will reference a non-existent attribute node[:mystuff][:typo]. This results in the following error:
Recipe Compile Error in /var/chef/cache/cookbooks/<yyy>/recipes/something.rb
undefined method '[]' for nil:NilClass
This is a worthless error because it doesn't let me know exactly what node/attribute is missing. Even running with chef-client -l debug doesn't help. knife cookbook test <x> doesn't help because syntactically it is correct. Is there a way to get it to print out the exact line number that is causing the error? The recipe may contain 10s or 100s of attributes so it is a huge time waster going through line by line to discover a typo.
I wrote Chef Sugar's deep_fetch method precisely for this reason.
The error you are getting is just the by-product of Ruby hashes. For more information on deep_fetch, you can also see my blog post on the subject: https://sethvargo.com/delicious-new-chef-sugars/

Running OpenMPI on Windows XP

I'm trying to build a simple cluster based on Windows XP. I compiled OpenMPI-1.4.2 successfully, and tools like mpicc and ompi_info work too, but I can't get my mpirun working properly. The only output I can see is
Z:\>orterun --hostfile z:\hosts.txt -np 2 hostname
[host0:04728] Failed to initialize COM library. Error code = -2147417850
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2
\orte\mca\ess\hnp\ess_hnp_module.c at line 218
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_plm_init failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2
\orte\runtime\orte_init.c at line 132
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_set_name failed
--> Returned value Error (-1) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\..\openmpi
-1.4.2\orte\tools\orterun\orterun.c at line 543
Where z:\hosts.txt appears as follows:
host0
host1
Z: is a shared network drive available to both host0 and host1.
What my problem is and how do I fix it?
Upd:
Ok, this problem seems to be fixed. It seems to me that WideCap driver and/or software components causes this error to appear. A "clean" machine runs local task successfully. Anyway, I still cannot run a task within at least 2 machines, I'm getting following message:
Z:\>mpirun --hostfile z:\hosts.txt -np 2 hostname
connecting to host1
username:MAIN\cluster
password:********
Save Credential?(Y/N) y
[host0:04728] This feature hasn't been implemented yet.
[host0:04728] Could not connect to namespace cimv2 on node host1. Error code =-2147217400
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error.
More information may be available above.
--------------------------------------------------------------------------
I googled a little and did all the things as described here: http://www.open-mpi.org/community/lists/users/2010/03/12355.php but I'm still getting the same error. Can anyone help me?
Upd2:
Error code -2147217400 might be WMI error WBEM_E_INVALID_PARAMETER (0x80041008) which occures when one of the parameters passed to the WMI call is not correct. Does this mean that the problem is in OpenMPI source code itself? Or maybe it's because of wrong/outdated wincred.h and credui.lib I used while building OpenMPI from the source code?

Resources