Azure Python SDK: How to delete a list of resources that have interdependencies? - azure-sdk

I have some code that uses the Python Azure SDK to deploy a virtual machine within a resource group. I manually provision each resource in order (a vnet and subnet if necessary, a public IP address, a NIC, and finally the VM itself).
Now, when I want to delete the VM, I can query the list of resources within the resource group and filter that list in my code to match only those resources which have a tag with the matching value.
The problem is that you can't just arbitrarily delete resources that have dependencies. For example, I cannot delete the NIC because it is in use by the virtual machine; I can't delete the OS disk because it's also in use by the VM; I can't delete the public IP address because it's assigned to the NIC; etc.
In the Azure portal you can check off a list of resources and ask the portal to delete all of them, and it handles any resource inter-dependencies for you, but it looks like this is not possible from the SDK.
Right now my only solution is to be fully aware of the path of resource creation and dependency within my code itself. I have to work backwards - first, search the list for VMs with the right tag, delete them, then search for disks with the tag, delete them, NICs, and so on down the line. But this has a lot of room for error and is not in any way reusable for other types of resources.
The only other alternative I can think of is "try to delete it and handle errors" but there's a lot of ugly edge cases I could see happening here and I'd rather take a less haphazard way of handling this, especially since we're deleting things.
TL;dr: Is there a proper way to take a list of resources and query Azure to determine which other resources depend on them? (This could be done one resource at a time but it would still be best to have it be "generic" - i.e. able to do this for any resource without necessarily knowing that resource's type up front).
The resource group contains other resources as well which are related to the same project (e.g. other VMs, a storage account, etc.) so deleting an entire resource group is NOT an option.

One of the workarounds that you can try is using Azure Powershell and tags. Try adding the tags to the resources that you wanted to delete and then use the below command to delete the resources in bulk.
$resources = az resource list --tag Key=Value| ConvertFrom-Json
foreach ($resource in $resources) {
az resource delete --resource-group $resource.resourceGroup --ids $resource.id --verbose
}
This will delete the resources regardless the location or the resource group where it has been created.

Related

Terraform : How to fetch or destroy resources created by other means?

Sometimes I end up creating resources using AWS console due to some errors in Terraform or for lack of time. Can I list all my resources and destroy them? Basically a discovery of existing cloud resources and management of such ?
Ex: list my EC2 instances using Terraform and destroy when needed . How to achieve this?
Terraform is designed to ignore any existing objects that it didn't create because otherwise it would be risky to adopt Terraform an existing system with many existing objects and it would be impossible to decompose the infrastructure into different configurations for each subsystem without each one trying to destroy the objects being managed by the others.
Terraform doesn't have any facility for automatically detecting objects created outside of Terraform, but you can explicitly bind specific objects from your remote system to resource instances in your Terraform configuration using the terraform import command.
That command has some safeguards to try to prevent accidentally immediately deleting an object you've just imported if e.g. you make a typo of the resource instance address, and so unfortunately the design of this command is contrary to your goal: it won't let you just import something and run terraform apply to destroy it.
Instead, you'd need to:
Write a stub empty resource block for a resource of the appropriate type in your configuration.
Run terraform import to bind your existing real object to that empty resource block.
After the import succeeds, immediately remove the resource block to tell Terraform that you intend to delete the object.
Run terraform apply, and then Terraform should notice that it's tracking an object that is no longer mentioned in the configuration and propose to delete it.
Terraform is not the best tool for this job because it has essentially been designed to do the exact opposite of what you want to do, because typically users want to avoid destroying untracked objects to avoid disrupting neighboring systems.
However, you may be able to get the effect you want with some custom programming on your part, by writing a program that does something like the following:
Run terraform show -json in all of your configuration working directories to obtain a machine-readable description of the Terraform state in each one.
Decode the JSON state descriptions to find all of the resource instances of type aws_instance and collect a set of all of their id attribute values. This is the set of instances to keep.
Call the EC2 API DescribeInstances action to retrieve a list of all of the instances that actually exist. Collect a set of all of their IDs. This is the set of instances that exist.
Set-subtract the set of instances to keep from the set of instances that exist. The result is the set of instances to destroy.
If the set of instances to destroy isn't empty, call the EC2 API's TerminateInstances action to terminate every instance ID in that set.
This description is specific to Amazon EC2 instances. The same pattern could apply to objects of any other type, but there is no general solution that will work across all object types at once because the AWS API design doesn't work that way: each object type has its own separate operations for querying which objects exist and for destroying a particular object or set of objects.

Clean Up Azure Machine Learning Blob Storage

I manage a frequently used Azure Machine Learning workspace. With several Experiments and active pipelines. Everything is working good so far. My problem is to get rid of old data from runs, experiments and pipelines. Over the last year the blob storage grew to enourmus size, because every pipeline data is stored.
I have deleted older runs from experimnents by using the gui, but the actual pipeline data on the blob store is not deleted. Is there a smart way to clean up data on the blob store from runs which have been deleted ?
On one of the countless Microsoft support pages, I found the following not very helpfull post:
*Azure does not automatically delete intermediate data written with OutputFileDatasetConfig. To avoid storage charges for large amounts of unneeded data, you should either:
Programmatically delete intermediate data at the end of a pipeline
run, when it is no longer needed
Use blob storage with a short-term storage policy for intermediate data (see Optimize costs by automating Azure Blob Storage access tiers)
Regularly review and delete no-longer-needed data*
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-move-data-in-out-of-pipelines#delete-outputfiledatasetconfig-contents-when-no-longer-needed
Any idea is welcome.
Have you tried applying an azure storage account management policy on the said storage account ?
You could either change the tier of the blob from hot -> cold -> archive and thereby reduce costs or even configure a auto delete policy after a set number of days
Reference : https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview#sample-rule
If you use terraform to manage your resources this should be available a
Reference : https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/storage_management_policy
resource "azurerm_storage_management_policy" "example" {
storage_account_id = "<azureml-storage-account-id>"
rule {
name = "rule2"
enabled = false
filters {
prefix_match = ["pipeline"]
}
actions {
base_blob {
delete_after_days_since_modification_greater_than = 90
}
}
}
}
Similar option is available via the portal settings as well.
Hope this helps!
Currently facing this exact problem. The most sensible approach is to enforce retention schedules at the storage account level. These are the steps you can follow:
Identify which storage account is linked to your AML instance and pull it up in the azure portal.
Under Settings / Configuration, ensure you are using StorageV2 (which has the desired functionality)
Under Data management / Lifecycle management, create a new rule that targets your problem containers.
NOTE - I do not recommend a blanket enforcement policy against the entire storage account, because any registered datasets, models, compute info, notebooks, etc will all be target for deletion as well. Instead, use the prefix arguments to declare relevant paths such as: storageaccount1234 / azureml / ExperimentRun
Here is the documentation on Lifecycle management:
https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview?tabs=azure-portal

Get DNS infos for local machine interfaces

I need the DNS suffix of all my local interfaces on my PC.
Is there way how I can achieve this via Go?
Best case would be for any OS
Necessary: working on Windows
I have tried net.Inferfaces() and all the net commands but I haven't found anything regarding the DNS server.
EDIT
I have found the solution for the Windows-specific version but it would be interesting if there is anything that works for Linux and macOS too.
I don't think there is a solution that work for any OS. In Linux the DNS suffix is not interface specific but system wide, it is configured in /etc/resolv.conf. Here is an excerpt from the man page:
search Search list for host-name lookup.
By default, the search list contains one entry, the local domain name. It is determined from the local hostname returned by gethostname(2); the local domain name is taken to be everything after the first '.'. Finally, if the hostname does not contain a '.', the root domain is assumed as the
local domain name.
This may be changed by listing the desired domain search path following the search keyword with spaces or tabs separating the names. Resolver queries having fewer than ndots dots (default is 1) in them will be attempted using each component of the search path in turn until a match is found.
For environments with multiple subdomains please read options ndots:n below to avoid man-in-the-middle attacks and unnecessary traffic for the root-dns-servers. Note that this process may be slow and will generate a lot of network traffic if the servers for the listed domains are not local, and
that queries will time out if no server is available for one of the domains.
If there are multiple search directives, only the search list from the last instance is used.
The net package standard library parses this file to get the DNS config, so the DNS resolver should behave as expected, however, the parsing functionality is not exposed.
The libnetwork.GetSearchDomains func in the libnetwork library should be able to help you out. If there are no search entries in /etc/resolv.conf, you should use the hostname, which can be gotten with the os.Hostname func.
I believe this also works for FreeBSD and Mac OS since they are both "UNIX like". But I am not 100% sure.

Good practice GCE + Windows: computer name

I have some Windows Server 2016 instances on GCE (for Jenkins agents).
I'm wondering what is the best/good practice when it comes to computer name.
Currently, when I want to create a new node, I clone an instance (create images from disks + create template + create instance from template).
On this clone, I change the computer name (in Windows) so that it has the same name as on GCE. Is it useful? recommended? bad? needed?
I know that the name of the Jenkins node needs to be the same as the name of the GCE instance (to be picked up easily). However, I don't think the Windows computer name matters.
So, should I pick an identical generic name for all of them? A prefix+random generated name? Continue with the instance=computer=node name?
The node name that I use in Jenkins is always retrieved from env.NODE_NAME (when needed), so that should not break any pipeline. Not sure thought, as I may be missing something (internal to Jenkins).
Bonus question: After cloning, I have to do some modifications on the clone for Perforce (p4) to work.
I temporarily set some env variables
I duplicate the workspace: p4 client -t prefix-buildX-suffix prefix-buildY-suffix
I setup the stream (not sure if doable in one step)
Then regenerate the list of files: p4 sync -k <root_folder_to_be_generated>/...#YYYY/MM/DD
So, here also there's a name prefix-buildY-suffix which is the same as the one from the instance=computer=node (buildY). It may be a separate question, but as it's still from the same context, I'm putting it here: should I recreate a new workspace all the time? Knowing that it's on several machines, I'd say yes. Otherwise, I "imagine" that p4 would have contradictory information about the state of this workspace. So, here also, I currently need to customize the name. So, even if I make the Windows computer name generic, I would still need to customize the p4 workspace name, wouldn't I?
Jenkins must have the same computer name as the one on the network.
So, all three names must be identical.

existdb: identify database server

We have a number of (developer) existDb database servers, and some staging/production servers.
Each have their own configuration, that are slightly different.
We need to select which configuration to load and use in queries.
The configuration is to be stored in an XML file within the repository.
However, when syncing the content of the servers, a single burnt-in XML file is not sufficient, since it is overwritten during copying from the other server.
For this, we need the physical name of the actual database server.
The only function found, request:get-server-name that is not quite stable since a single eXist server can be accessed through a number of various (localhost, intranet or external) URLs. However, that leads to unnecessary duplication of the configuration, one for each external URL...
(Accessing some local files in the file system is not secure and not fast.)
How to get the physical name of the existDb server from XQuery?
I m sorry but I don't fully understand your question, are you talking about exist's default conf.xml or your own configuration file that you need to store in a VCS repo? Should the xquery be executed on one instance and trigger an event in all others, or just some, or...? Without some code it is difficult to see why and when something gets overwritten.
you could try console:jmx-token which does not vary depending on URL (at least it shouldn't)
Also you might find it much easier to use a docker based approach. Either with multiple instances coordinated via docker-compose or to keep the individual configs from not interfering with each other when moving from dev to staging to production https://github.com/duncdrum/exist-docker
If I understand correctly, you basically want to be able to get the hostname or the IP address of a server from XQuery. If the functions in the XQuery Request module are not doing as you wish, then another option would be to set a Java System Property when starting eXist-db. This system property could be the internal DNS name or IP of your server, for example: -Dour-server-name=server1.mydomain.com
From XQuery you could then read that Java System property using util:system-property("our-server-name").

Resources