Is there anyway to restart an Azure classic cloud service role every interval? - azure-cloud-services

I'm working with an Azure cloud service (classic) that has a couple role processes. One of them is a worker that's become a little unstable after a week so I want to restart it every few days. Eventually the worker role will be made stable but in the meantime it would be nice to auto-restart it every few days if possible.
Is there a way to restart an Azure classic cloud service worker role every day or so? Programmatically or via configuration?

Absolutely Yes, there are two ways to restart an Azure classic Cloud Service role instance via triggered programmatically per an interval.
Call the REST API Reboot Role Instance with a crontab trigger in programming
You can restart these Virtual Machines of the role via call the REST API Virtual Machines - Restart in programming or directly use the same feature API of Azure SDK for a programming language.

I asked this question on the Azure forum and on Reddit.
The first response was at the Azure Forum,
Marcin said:
You can use for this purpose Azure Automation
https://learn.microsoft.com/en-us/azure/cloud-services/automation-manage-cloud-services
https://gallery.technet.microsoft.com/scriptcenter/Reboot-Cloud-Service-PaaS-b337a06d
Then on Reddit, quentech said:
You can do it with a PowerShell Workflow Runbook:
workflow ResetRoleClassic
{
Param
(
[Parameter (Mandatory = $true)]
[string]$serviceName,
[Parameter (Mandatory = $true)]
[string]$slot,
[Parameter (Mandatory = $true)]
[string]$instanceName
)
$ConnectionAssetName = "AzureClassicRunAsConnection"
# Get the connection
$connection = Get-AutomationConnection -Name $connectionAssetName
# Authenticate to Azure with certificate
Write-Verbose "Get connection asset: $ConnectionAssetName" -Verbose
$Conn = Get-AutomationConnection -Name $ConnectionAssetName
if ($Conn -eq $null)
{
throw "Could not retrieve connection asset: $ConnectionAssetName. Assure that this asset exists in the Automation account."
}
$CertificateAssetName = $Conn.CertificateAssetName
Write-Verbose "Getting the certificate: $CertificateAssetName" -Verbose
$AzureCert = Get-AutomationCertificate -Name $CertificateAssetName
if ($AzureCert -eq $null)
{
throw "Could not retrieve certificate asset: $CertificateAssetName. Assure that this asset exists in the Automation account."
}
Write-Verbose "Authenticating to Azure with certificate." -Verbose
Set-AzureSubscription -SubscriptionName $Conn.SubscriptionName -SubscriptionId $Conn.SubscriptionID -Certificate $AzureCert
Select-AzureSubscription -SubscriptionId $Conn.SubscriptionID
Write-Verbose "Getting $serviceName Role." -Verbose
$results = Get-AzureRole -ServiceName $serviceName -InstanceDetails
Write-Output $results
Write-Verbose "Resetting Role Instance $instanceName" -Verbose
$results = Reset-AzureRoleInstance -ServiceName $serviceName -Slot $slot -InstanceName $instanceName -Reboot
Write-Output $results
}
I made some minor changes to the parameters and removed the outer braces. And thus was able to use the script as is for the most part.

Related

Does "SAM Local Invoke" support EFS?

I'm using Lambda to access EFS as described at https://docs.aws.amazon.com/lambda/latest/dg/configuration-filesystem.html
The lambda function works fine when running in AWS, but it fails when using SAM with the "local invoke" command. The error is
2020-10-02T20:03:19.389Z 09b6f1b2-d80a-15e1-9531-f74182e95c1e ERROR Invoke Error
{
"errorType":"Error",
"errorMessage":"ENOENT: no such file or directory, open '/mnt/efs/newfile.txt'",
"code":"ENOENT",
"errno":-2,
"syscall":"open",
"path":"/mnt/efs/newfile.txt",
"stack":[
"Error: ENOENT: no such file or directory, open '/mnt/efs/newfile.txt'",
" at Object.openSync (fs.js:458:3)",
" at Object.writeFileSync (fs.js:1355:35)",
" at WriteFile (/var/task/src/apis/permissions/isallowed.js:70:8)",
" at IsAllowedInPolicy (/var/task/src/apis/permissions/isallowed.js:52:5)",
" at Runtime.exports.handler (/var/task/src/apis/permissions/isallowed.js:16:28)",
" at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"
]
}
Is "sam local invoke" supposed to work with EFS?
The answer is no.
I opened a support ticket with AWS and was told
This is a limitation on the AWS SAM CLI and not your configuration.
Therefore, I have taken the initiative to submit an internal feature
request with our internal service team(specifically AWS SAM CLI
service team) on your behalf and I have added your company name and
voice to this request. At the moment, we would not be able to provide
an estimate on if or when this feature will be supported. I would
advise to check the AWS announcement page from time to time for future
service updates. https://aws.amazon.com/new/
I also discovered that someone submitted a feature request on GitHub as a workaround.

Terraform azurerm_virtual_machine_extension error "extension operations are disallowed"

I have written a Terraform template that creates an Azure Windows VM. I need to configure the VM to Enable PowerShell Remoting for the release pipeline to be able to execute Powershell scripts. After the VM is created I can RDP to the VM and do everything I need to do to enable Powershell remoting, however, it would be ideal if I could script all of that so it could be executed in a Release pipeline. There are two things that prevent that.
The first, and the topic of this question is, that I have to run "WinRM quickconfig". I have the template working such that when I do RDP to the VM, after creation, that when I run "WinRM quickconfig" I receive the following responses:
WinRM service is already running on this machine.
WinRM is not set up to allow remote access to this machine for management.
The following changes must be made:
Configure LocalAccountTokenFilterPolicy to grant administrative rights remotely to local users.
Make these changes [y/n]?
I want to configure the VM in Terraform so LocalAccountTokenFilterPolicy is set and it becomes unnecessary to RDP to the VM to run "WinRM quickconfig". After some research it appeared I might be able to do that using the resource azure_virtual_machine_extension. I add this to my template:
resource "azurerm_virtual_machine_extension" "vmx" {
name = "hostname"
location = "${var.location}"
resource_group_name = "${var.vm-resource-group-name}"
virtual_machine_name = "${azurerm_virtual_machine.vm.name}"
publisher = "Microsoft.Azure.Extensions"
type = "CustomScript"
type_handler_version = "2.0"
settings = <<SETTINGS
{
# "commandToExecute": "powershell Set-ItemProperty -Path 'HKLM:\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Policies\\System' -Name 'LocalAccountTokenFilterPolicy' -Value 1 -Force"
}
SETTINGS
}
When I apply this, I get the error:
Error: compute.VirtualMachineExtensionsClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=<nil> Code="OperationNotAllowed" Message="This operation cannot be performed when extension operations are disallowed. To allow, please ensure VM Agent is installed on the VM and the osProfile.allowExtensionOperations property is true."
I couldn't find any Terraform documentation that addresses how to set the allowExtensionOperations property to true. On a whim, I tried adding the property "allow_extension_operations" to the os_profile block in the azurerm_virtual_machine resource but it is rejected as an invalid property. I also tried adding it to the os_profile_windows_config block and isn't valid there either.
I found a statement on Microsoft's documentation regarding the osProfile.allowExtensionOperations property that says:
"This may only be set to False when no extensions are present on the virtual machine."
https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.management.compute.models.osprofile.allowextensionoperations?view=azure-dotnet
This implies to me that the property is True by default but it doesn't actually say that and it certainly isn't acting like that. Is there a way in Terraform to set osProfile.alowExtensionOperations to true?
Running into the same issue adding extensions using Terraform, i created a Windows 2016 custom image,
provider "azurerm" version ="2.0.0"
Terraform 0.12.24
Terraform apply error:
compute.VirtualMachineExtensionsClient#CreateOrUpdate: Failure sending request: StatusCode=0
-- Original Error: autorest/azure: Service returned an error.
Status=<nil>
Code="OperationNotAllowed"
Message="This operation cannot be performed when extension operations are disallowed. To allow, please ensure VM Agent is installed on the VM and the osProfile.allowExtensionOperations property is true."
I ran into same error, possible solution depends on 2 things here.
You have to pass provider "azurerm" version ="2.5.0 and you have to pass os_profile_windows_config (see below) parameter in virtual machine resource as well. So, that terraform will consider the extensions that your are passing. This fixed my errors.
os_profile_windows_config {
provision_vm_agent = true
}

VSTS CICD Build pipline failed after new .pfx file with latest expiration date added to repo

I have created a CICD pipline for a UWP project. The pipeline's powershell task with below mentioned code failed giving error message that Error APPX0108: The certificate specified has expired. For more information about renewing certificates, see http://go.microsoft.com/fwlink/?LinkID=241478
So i created a new test certificate with an expirationdate of next year in Visual Studio 2017 and pushed the new certificate(.pfx) file to the repo. I also updated .csproj file to update it with name of new certificate.
Now i get the same error for a build solution task in the pipeline. I would like to know what i am missing. Do i have to add that certificate into my build server also?
PowerShell script code:
Param(
[String]$pfxpath,
[String]$password
)
if (-Not $pfxpath) {
Write-Host "Certificate path not set"
exit 1
}
if (-Not $password) {
Write-Host "Password not set"
exit 1
}
Add-Type -AssemblyName System.Security
$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2
$cert.Import($pfxpath, $password, [System.Security.Cryptography.X509Certificates.X509KeyStorageFlags]"PersistKeySet")
$store = new-object system.security.cryptography.X509Certificates.X509Store -argumentlist "MY", CurrentUser
$store.Open([System.Security.Cryptography.X509Certificates.OpenFlags]"ReadWrite")
$store.Add($cert)
$store.Close()

OctopusDeploy - SQL Always-On Deployment

I've got an OctopusDeploy process for deploying to a database server.
The steps include copying db backups from a location to the local server, restoring them into SQL Server, then running a dacpac against them to upgrade them to a specific version.
This all works fine, but we've now added a new environment and I can't work out how to configure the deployment process for it.
Initially, the server was to be a windows clustered environment, with the tentacle running as a clustered service (which meant a single deployment target).
However, the company setting up our servers couldn't get clustering to work for whatever reason, and have now given us something in between:
We have two servers, each with the tentacle installed, configured and running on it.
Each tentacle has a unique thumbprint, and are always running and accessible.
Upon the windows servers, SQL Server has been installed and configured as "always on", with one server being the primary and the other being the secondary.
The idea being that if the primary dies, the secondary picks up the pieces and runs fine.
Conceptually, this works for us, as we have a "clustered" ip for the SQL server connection and our web app won't notice the difference.
(It's important to note, I CANNOT change this setup - it's a case of work with what we're given....)
Now, in Octopus, I need to ONLY deploy to one of the servers in this environment, as if I were to deploy to both, I'd either be duplicating the task (if run as a rolling deployment), or worse, have conflicting deployments (if run asynchronous).
I initially tried added a secondary role to each server ("PrimaryNode", "SecondaryNode"), but I then discovered Octopus treats roles as an "or" rather than an "and", so this wouldn't work for us out of the box
I then looked at writing powershell scripts that checked if the machine that had the roles "dbserver" AND "primarynode" had a status of "Online" and a health of "healthy", then set an output variable based on the status:
##CONFIG##
$APIKey = "API-OBSCURED"
$MainRole = "DBServer"
$SecondaryRole = "PrimaryNode"
$roles = $OctopusParameters['Octopus.Machine.Roles'] -split ","
$enableFailoverDeployment = $false
foreach($role in $roles)
{
if ($role -eq "FailoverNode")
{
#This is the failovernode - check if the primary node is up and running
Write-Host "This is the failover database node. Checking if primary node is available before attempting deployment."
$OctopusURL = "https://myOctourl" ##$OctopusParameters['Octopus.Web.BaseUrl']
$EnvironmentID = $OctopusParameters['Octopus.Environment.Id']
$header = #{ "X-Octopus-ApiKey" = $APIKey }
$environment = (Invoke-WebRequest -UseBasicParsing "$OctopusURL/api/environments/$EnvironmentID" -Headers $header).content | ConvertFrom-Json
$machines = ((Invoke-WebRequest -UseBasicParsing ($OctopusURL + $environment.Links.Machines) -Headers $header).content | ConvertFrom-Json).items
$MachinesInRole = $machines | ?{$MainRole -in $_.Roles}
$MachinesInRole = $MachinesInRole | ?{$SecondaryRole -in $_.Roles}
$measure = $MachinesInRole | measure
$total = $measure.Count
if ($total -gt 0)
{
$currentMachine = $MachinesInRole[0]
$machineUri = $currentMachine.URI
if ($currentMachine.Status -eq "Online")
{
if ($currentMachine.HealthStatus -eq "Healthy")
{
Write-Host "Primary node is online and healthy."
Write-Host "Setting flag to disable failover deployment."
$enableFailoverDeployment = $false
}
else
{
Write-Host "Primary node has a health status of $($currentMachine.HealthStatus)."
Write-Host "Setting flag to enable failover deployment."
$enableFailoverDeployment = $true
}
}
else
{
Write-Host "Primary node has a status of $($currentMachine.Status)."
Write-Host "Setting flag to enable failover deployment."
$enableFailoverDeployment = $true
}
}
break;
}
}
Set-OctopusVariable -name "EnableFailoverDeployment" -value $enableFailoverDeployment
This seemingly works - I can tell if I should deploy to the primary OR the secondary.
However, I'm now stuck at how I get the deployment process to use this.
Obviously, if the primary node is offline, then the deployment won't happen on it anyway.
Yet, if BOTH tentacles are online and healthy, then octopus will just attempt to deploy to them.
The deployment process contains about 12 unique steps, and is successfully used in several other environments (all single-server configurations), but as mentioned, now needs to ALSO deploy to a weird active/warm environment.
Any ideas how I might achieve this?
(If only you could specify "AND" in roles..)
UPDATE 1
I've now found that you can update specific machines "IsDisabed" via the web api, so I added code to the end of the above to enable/disable the secondary node depending on the outcome instead of setting an output variable.
Whilst this does indeed update the machine's status, it doesn't actually effect the ongoing deployment process.
If I stop and restart the whole process, the machine is correctly picked up as enabled/disabled accordingly, but again, if it's status changes DURING the deployment, Octopus doesn't appear to be "smart" enough to recognise this, ruling this option out.
(I did try adding a healthcheck step before and after this script to see if that made a difference, but whilst the healthcheck realised the machine was disabled, it still made no difference to the rest of the steps)
Update 2
I've now also found the "ExcludedMachineIds" property of the "Deployment" in the API, but I get a 405 (not allowed) error when trying to update it once a deployment is in process.
gah.. why can't this be easy?
ok - so the route we took with this was to have a script run against the clustered Always-On SQL instance, which identified the primary and secondary nodes, as follows:
SELECT TOP 1 hags.primary_replica
FROM sys.dm_hadr_availability_group_states hags
INNER JOIN sys.availability_groups ag
ON ag.group_id = hags.group_id
WHERE ag.name = '$alwaysOnClusterInstance';
This allowed me to get the hostname of the primary server.
I then took the decision to include the hostname in the actual display name of the machine within OctopusDeploy.
I then do a simple "like" comparison with Powershell between the result from the above SQL and the current machine display name ($OctopusParameters['Octopus.Machine.Name'])
If there's a match, then I set an output variable from this step equal to the internal ID of the OctopusDeploy machine ($OctopusParameters['Octopus.Machine.Id'])
Finally, at the start of each step, I simply compare the current machine id against the above mentioned output variable to determine if I am on the primary node or a secondary node, and act accordingly (usually by exiting the step immediately if it's a secondary node)
The last thing to note is that every single step where I care what machine the step is being run on, has to be run as a "Rolling step", with a windows size of 1.
Luckily, as we are just usually exiting if we're not on the primary node, this doesn't add any real time to our deployment process.

Write-EventLog within Function over Remote PowerShell

The environment:
Server: Windows Server 2012 R2 with Remote PowerShell enabled.
Workstation: Windows 8.1
So I've created a PowerShell Module called MyModule.psm1 with the following function:
Function CreateEvent() {
Write-EventLog –LogName ToolLog –Source “Schedule” –EntryType Information –EventID 13 –Message “There were users written to the database.”
}
I created a PSSessionConfigurationFile and then registered it with a configuration name of EventLogging, so that I can remote powershell via the following:
$Creds = Get-Credential
$SessionOpts = New-PSSessionOption -SkipCACheck -SkipCNCheck -SkipRevocationCheck
$Session = New-PSSession -ComputerName Server.Domain.Com -ConfigurationName EventLogging -Credential $Creds -UseSSL -SessionOption $SessionOpts
Import-PSSession $Session
Now, when I enter a Local Administrators credentials into the Get-Credential, I can run the function CreateEvent and everything works just fine. However, if I enter a Standard Local Users credentials, I get an error of: The registry key for the log "ToolLog" for source "Schedule" could not be opened.
I replaced the Write-EventLog in the Function with:
$EventLog = new-object System.Diagnostics.EventLog("ToolLog");
$EventLog.MachineName = ".";
$EventLog.Source = "Schedule";
$EventLog.WriteEntry("There were users written to the database.", "Information", 15);
And I receive an error of: Exception calling "WriteEntry" with "3" argument(s): "Cannot open log for source 'Schedule'. You may not have write access."
If I log on to the server locally and Import the Module and try to run the function I get the same exact errors. I also cannot run the cmdlet of Write-EventLog by itself.
From all of the information I found on the internet, I've give my local non-admin user write permissions to the event log. Both through RegEdit and through NTFS on the actual Event Log file.
Any ideas?
Thanks,
Brian
It's my understanding that only Administrators can create new event logs. I'm not sure if there is a way around this or not. I suggest adding the new event log on your server as an administrator ahead of time so that the event log is there before non-administrators try to write to it.

Resources