OctopusDeploy - SQL Always-On Deployment - octopus-deploy

I've got an OctopusDeploy process for deploying to a database server.
The steps include copying db backups from a location to the local server, restoring them into SQL Server, then running a dacpac against them to upgrade them to a specific version.
This all works fine, but we've now added a new environment and I can't work out how to configure the deployment process for it.
Initially, the server was to be a windows clustered environment, with the tentacle running as a clustered service (which meant a single deployment target).
However, the company setting up our servers couldn't get clustering to work for whatever reason, and have now given us something in between:
We have two servers, each with the tentacle installed, configured and running on it.
Each tentacle has a unique thumbprint, and are always running and accessible.
Upon the windows servers, SQL Server has been installed and configured as "always on", with one server being the primary and the other being the secondary.
The idea being that if the primary dies, the secondary picks up the pieces and runs fine.
Conceptually, this works for us, as we have a "clustered" ip for the SQL server connection and our web app won't notice the difference.
(It's important to note, I CANNOT change this setup - it's a case of work with what we're given....)
Now, in Octopus, I need to ONLY deploy to one of the servers in this environment, as if I were to deploy to both, I'd either be duplicating the task (if run as a rolling deployment), or worse, have conflicting deployments (if run asynchronous).
I initially tried added a secondary role to each server ("PrimaryNode", "SecondaryNode"), but I then discovered Octopus treats roles as an "or" rather than an "and", so this wouldn't work for us out of the box
I then looked at writing powershell scripts that checked if the machine that had the roles "dbserver" AND "primarynode" had a status of "Online" and a health of "healthy", then set an output variable based on the status:
##CONFIG##
$APIKey = "API-OBSCURED"
$MainRole = "DBServer"
$SecondaryRole = "PrimaryNode"
$roles = $OctopusParameters['Octopus.Machine.Roles'] -split ","
$enableFailoverDeployment = $false
foreach($role in $roles)
{
if ($role -eq "FailoverNode")
{
#This is the failovernode - check if the primary node is up and running
Write-Host "This is the failover database node. Checking if primary node is available before attempting deployment."
$OctopusURL = "https://myOctourl" ##$OctopusParameters['Octopus.Web.BaseUrl']
$EnvironmentID = $OctopusParameters['Octopus.Environment.Id']
$header = #{ "X-Octopus-ApiKey" = $APIKey }
$environment = (Invoke-WebRequest -UseBasicParsing "$OctopusURL/api/environments/$EnvironmentID" -Headers $header).content | ConvertFrom-Json
$machines = ((Invoke-WebRequest -UseBasicParsing ($OctopusURL + $environment.Links.Machines) -Headers $header).content | ConvertFrom-Json).items
$MachinesInRole = $machines | ?{$MainRole -in $_.Roles}
$MachinesInRole = $MachinesInRole | ?{$SecondaryRole -in $_.Roles}
$measure = $MachinesInRole | measure
$total = $measure.Count
if ($total -gt 0)
{
$currentMachine = $MachinesInRole[0]
$machineUri = $currentMachine.URI
if ($currentMachine.Status -eq "Online")
{
if ($currentMachine.HealthStatus -eq "Healthy")
{
Write-Host "Primary node is online and healthy."
Write-Host "Setting flag to disable failover deployment."
$enableFailoverDeployment = $false
}
else
{
Write-Host "Primary node has a health status of $($currentMachine.HealthStatus)."
Write-Host "Setting flag to enable failover deployment."
$enableFailoverDeployment = $true
}
}
else
{
Write-Host "Primary node has a status of $($currentMachine.Status)."
Write-Host "Setting flag to enable failover deployment."
$enableFailoverDeployment = $true
}
}
break;
}
}
Set-OctopusVariable -name "EnableFailoverDeployment" -value $enableFailoverDeployment
This seemingly works - I can tell if I should deploy to the primary OR the secondary.
However, I'm now stuck at how I get the deployment process to use this.
Obviously, if the primary node is offline, then the deployment won't happen on it anyway.
Yet, if BOTH tentacles are online and healthy, then octopus will just attempt to deploy to them.
The deployment process contains about 12 unique steps, and is successfully used in several other environments (all single-server configurations), but as mentioned, now needs to ALSO deploy to a weird active/warm environment.
Any ideas how I might achieve this?
(If only you could specify "AND" in roles..)
UPDATE 1
I've now found that you can update specific machines "IsDisabed" via the web api, so I added code to the end of the above to enable/disable the secondary node depending on the outcome instead of setting an output variable.
Whilst this does indeed update the machine's status, it doesn't actually effect the ongoing deployment process.
If I stop and restart the whole process, the machine is correctly picked up as enabled/disabled accordingly, but again, if it's status changes DURING the deployment, Octopus doesn't appear to be "smart" enough to recognise this, ruling this option out.
(I did try adding a healthcheck step before and after this script to see if that made a difference, but whilst the healthcheck realised the machine was disabled, it still made no difference to the rest of the steps)
Update 2
I've now also found the "ExcludedMachineIds" property of the "Deployment" in the API, but I get a 405 (not allowed) error when trying to update it once a deployment is in process.
gah.. why can't this be easy?

ok - so the route we took with this was to have a script run against the clustered Always-On SQL instance, which identified the primary and secondary nodes, as follows:
SELECT TOP 1 hags.primary_replica
FROM sys.dm_hadr_availability_group_states hags
INNER JOIN sys.availability_groups ag
ON ag.group_id = hags.group_id
WHERE ag.name = '$alwaysOnClusterInstance';
This allowed me to get the hostname of the primary server.
I then took the decision to include the hostname in the actual display name of the machine within OctopusDeploy.
I then do a simple "like" comparison with Powershell between the result from the above SQL and the current machine display name ($OctopusParameters['Octopus.Machine.Name'])
If there's a match, then I set an output variable from this step equal to the internal ID of the OctopusDeploy machine ($OctopusParameters['Octopus.Machine.Id'])
Finally, at the start of each step, I simply compare the current machine id against the above mentioned output variable to determine if I am on the primary node or a secondary node, and act accordingly (usually by exiting the step immediately if it's a secondary node)
The last thing to note is that every single step where I care what machine the step is being run on, has to be run as a "Rolling step", with a windows size of 1.
Luckily, as we are just usually exiting if we're not on the primary node, this doesn't add any real time to our deployment process.

Related

Terraform azurerm_virtual_machine_extension error "extension operations are disallowed"

I have written a Terraform template that creates an Azure Windows VM. I need to configure the VM to Enable PowerShell Remoting for the release pipeline to be able to execute Powershell scripts. After the VM is created I can RDP to the VM and do everything I need to do to enable Powershell remoting, however, it would be ideal if I could script all of that so it could be executed in a Release pipeline. There are two things that prevent that.
The first, and the topic of this question is, that I have to run "WinRM quickconfig". I have the template working such that when I do RDP to the VM, after creation, that when I run "WinRM quickconfig" I receive the following responses:
WinRM service is already running on this machine.
WinRM is not set up to allow remote access to this machine for management.
The following changes must be made:
Configure LocalAccountTokenFilterPolicy to grant administrative rights remotely to local users.
Make these changes [y/n]?
I want to configure the VM in Terraform so LocalAccountTokenFilterPolicy is set and it becomes unnecessary to RDP to the VM to run "WinRM quickconfig". After some research it appeared I might be able to do that using the resource azure_virtual_machine_extension. I add this to my template:
resource "azurerm_virtual_machine_extension" "vmx" {
name = "hostname"
location = "${var.location}"
resource_group_name = "${var.vm-resource-group-name}"
virtual_machine_name = "${azurerm_virtual_machine.vm.name}"
publisher = "Microsoft.Azure.Extensions"
type = "CustomScript"
type_handler_version = "2.0"
settings = <<SETTINGS
{
# "commandToExecute": "powershell Set-ItemProperty -Path 'HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Policies\\System' -Name 'LocalAccountTokenFilterPolicy' -Value 1 -Force"
}
SETTINGS
}
When I apply this, I get the error:
Error: compute.VirtualMachineExtensionsClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="This operation cannot be performed when extension operations are disallowed. To allow, please ensure VM Agent is installed on the VM and the osProfile.allowExtensionOperations property is true."
I couldn't find any Terraform documentation that addresses how to set the allowExtensionOperations property to true. On a whim, I tried adding the property "allow_extension_operations" to the os_profile block in the azurerm_virtual_machine resource but it is rejected as an invalid property. I also tried adding it to the os_profile_windows_config block and isn't valid there either.
I found a statement on Microsoft's documentation regarding the osProfile.allowExtensionOperations property that says:
"This may only be set to False when no extensions are present on the virtual machine."
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.compute.models.osprofile.allowextensionoperations?view=azure-dotnet
This implies to me that the property is True by default but it doesn't actually say that and it certainly isn't acting like that. Is there a way in Terraform to set osProfile.alowExtensionOperations to true?
Running into the same issue adding extensions using Terraform, i created a Windows 2016 custom image,
provider "azurerm" version ="2.0.0"
Terraform 0.12.24
Terraform apply error:
compute.VirtualMachineExtensionsClient#CreateOrUpdate: Failure sending request: StatusCode=0
-- Original Error: autorest/azure: Service returned an error.
Status=<nil>
Code="OperationNotAllowed"
Message="This operation cannot be performed when extension operations are disallowed. To allow, please ensure VM Agent is installed on the VM and the osProfile.allowExtensionOperations property is true."
I ran into same error, possible solution depends on 2 things here.
You have to pass provider "azurerm" version ="2.5.0 and you have to pass os_profile_windows_config (see below) parameter in virtual machine resource as well. So, that terraform will consider the extensions that your are passing. This fixed my errors.
os_profile_windows_config {
provision_vm_agent = true
}

Is there anyway to restart an Azure classic cloud service role every interval?

I'm working with an Azure cloud service (classic) that has a couple role processes. One of them is a worker that's become a little unstable after a week so I want to restart it every few days. Eventually the worker role will be made stable but in the meantime it would be nice to auto-restart it every few days if possible.
Is there a way to restart an Azure classic cloud service worker role every day or so? Programmatically or via configuration?
Absolutely Yes, there are two ways to restart an Azure classic Cloud Service role instance via triggered programmatically per an interval.
Call the REST API Reboot Role Instance with a crontab trigger in programming
You can restart these Virtual Machines of the role via call the REST API Virtual Machines - Restart in programming or directly use the same feature API of Azure SDK for a programming language.
I asked this question on the Azure forum and on Reddit.
The first response was at the Azure Forum,
Marcin said:
You can use for this purpose Azure Automation
https://learn.microsoft.com/en-us/azure/cloud-services/automation-manage-cloud-services
https://gallery.technet.microsoft.com/scriptcenter/Reboot-Cloud-Service-PaaS-b337a06d
Then on Reddit, quentech said:
You can do it with a PowerShell Workflow Runbook:
workflow ResetRoleClassic
{
Param
(
[Parameter (Mandatory = $true)]
[string]$serviceName,
[Parameter (Mandatory = $true)]
[string]$slot,
[Parameter (Mandatory = $true)]
[string]$instanceName
)
$ConnectionAssetName = "AzureClassicRunAsConnection"
# Get the connection
$connection = Get-AutomationConnection -Name $connectionAssetName
# Authenticate to Azure with certificate
Write-Verbose "Get connection asset: $ConnectionAssetName" -Verbose
$Conn = Get-AutomationConnection -Name $ConnectionAssetName
if ($Conn -eq $null)
{
throw "Could not retrieve connection asset: $ConnectionAssetName. Assure that this asset exists in the Automation account."
}
$CertificateAssetName = $Conn.CertificateAssetName
Write-Verbose "Getting the certificate: $CertificateAssetName" -Verbose
$AzureCert = Get-AutomationCertificate -Name $CertificateAssetName
if ($AzureCert -eq $null)
{
throw "Could not retrieve certificate asset: $CertificateAssetName. Assure that this asset exists in the Automation account."
}
Write-Verbose "Authenticating to Azure with certificate." -Verbose
Set-AzureSubscription -SubscriptionName $Conn.SubscriptionName -SubscriptionId $Conn.SubscriptionID -Certificate $AzureCert
Select-AzureSubscription -SubscriptionId $Conn.SubscriptionID
Write-Verbose "Getting $serviceName Role." -Verbose
$results = Get-AzureRole -ServiceName $serviceName -InstanceDetails
Write-Output $results
Write-Verbose "Resetting Role Instance $instanceName" -Verbose
$results = Reset-AzureRoleInstance -ServiceName $serviceName -Slot $slot -InstanceName $instanceName -Reboot
Write-Output $results
}
I made some minor changes to the parameters and removed the outer braces. And thus was able to use the script as is for the most part.

Missing values from JMeter results file when run remotely

When running a test which makes use of the of the Jmeter-Plugins listener Response Times vs Threads or Active Threads Over Time remote running of the test plan produces a results file which contains missing results used to plot the actual graph, however when run locally all results are returned. E.g. when using the Response Times vs Threads:
Example of a local result:
1383659591841,59,Example 1,200,OK,Example 1 1-579,text,true,183,22,22,59
Example of a remote result:
1383659859149,43,Example 1,200,OK,Example 1 1-575,text,true,183,43
Note the last two fields are missing
I would check the script definition of the two server: maybe some configuration for the "Write results to file" controller has been changed.
Take the local jmx service and copy it to the remote server.
Also, look for differences in the "# Results file configuration" section of jmeter.properties file.
Make sure that on all of the slave/remote servers the jmeter.properties file within $JMETER_HOME/bin has the following setting
jmeter.save.saveservice.thread_counts=true
By default this is set to false (and commented out)
For more informtation:
JMeter Plugins Installation

Informatica error 1417 :: Task not yet registered with this service process

I am getting following error while running a workflow in informatica.
Session task instance [worklet.session] : [TM_6775 The master DTM process was unable to connect to the master service process to update the session status with the following message: error message [ERROR: The session run for [Session task instance [worklet.session]] and [ folder id = 206, workflow id = 16042, workflow run id = 65095209, worklet run id = 65095337, task instance id = 13272 ] is not yet registered with this service process.] and error code [1417].]
This error comes randomly for many other sessions, when they are ran through workflow as a whole. However if I "start task" that failed task next time, it runs successfully.
Any help is much appreciated.
Just an idea to try if you use versioning. Check that everthing is checked in correctly. If the mapping, worflow or worklet is checked out then you and informatica will run different versions wich may cause the behaivour to differ when you start it manually.
Infromatica will allways use the checked in version and you will allways use the checked out version.

wsadmin + jython restart WAS appserver

Is it possible to stop/start WAS appserver using wsadmin (jacl/jython). I want to detele all caches on profile and then restart WAS appserver. I'm using wsadmin as standalone.
From wsadmin you may issue a command (using Jython):
AdminControl.invoke(AdminControl.queryNames('WebSphere:*,type=Server,node=%s,process=%s' % ('YourNodeName', 'YourServerName')), 'restart')
works with WAS Base & ND.
With ND you have another option:
AdminControl.invoke(AdminControl.queryNames('WebSphere:*,type=Server,node=%s,process=%s' % ('YourNodeName', 'YourServerName')), 'stop')
# now your server is stopped, you can do any cleanup
# and then start the server with NodeAgent
AdminControl.invoke(AdminControl.queryNames('WebSphere:*,type=NodeAgent,node=%s' % 'YourNodeName'), 'launchProcess', ['YourServerName'], ['java.lang.String'])
Check out the wsadminlib script. It has over 500 methods written for you to perform specific wsadmin tasks. Also check out related wsadminlib blog - you'll definitely want to view the powerpoint on this site to get an overview of usage.
You don't specify which caches you would like to clear. If you want to clear dynacache, wsadminlib offers clearDynaCache, clearAllProxyCaches, and others as well as server restart methods.
Example usage:
import sys
execfile('/opt/software/portalsoftware/wsadminlib/wsadminlib.py')
clearAllProxyCaches()
for (nodename,servername) in listAllAppServers():
clearDynaCache( nodename, servername, dynacachename )
save()
maxwaitseconds=300
restartServer( nodename, servername, maxwaitseconds)

Resources