Consul API, Retrieve services instances list from all nodes - consul

I've explored service catalog, all I can see are the actual Service names, but not each service instance.
for e.g using Consul API/CLI,
I can only see Service-A,
But I prefer to see all instances under it.
Service-A-abc
Service-A-xyz
Service-A-123
How can I get the all services instances from all nodes together?

You can't get this information directly.
The only way to get an entire list is to write a script that first enumerates the services by name (/v1/catalog/services) and then enumerates the instance by service name (/v1/catalog/service/<service_name>).
Rinse and repeat.

Related

Terraform : How to fetch or destroy resources created by other means?

Sometimes I end up creating resources using AWS console due to some errors in Terraform or for lack of time. Can I list all my resources and destroy them? Basically a discovery of existing cloud resources and management of such ?
Ex: list my EC2 instances using Terraform and destroy when needed . How to achieve this?
Terraform is designed to ignore any existing objects that it didn't create because otherwise it would be risky to adopt Terraform an existing system with many existing objects and it would be impossible to decompose the infrastructure into different configurations for each subsystem without each one trying to destroy the objects being managed by the others.
Terraform doesn't have any facility for automatically detecting objects created outside of Terraform, but you can explicitly bind specific objects from your remote system to resource instances in your Terraform configuration using the terraform import command.
That command has some safeguards to try to prevent accidentally immediately deleting an object you've just imported if e.g. you make a typo of the resource instance address, and so unfortunately the design of this command is contrary to your goal: it won't let you just import something and run terraform apply to destroy it.
Instead, you'd need to:
Write a stub empty resource block for a resource of the appropriate type in your configuration.
Run terraform import to bind your existing real object to that empty resource block.
After the import succeeds, immediately remove the resource block to tell Terraform that you intend to delete the object.
Run terraform apply, and then Terraform should notice that it's tracking an object that is no longer mentioned in the configuration and propose to delete it.
Terraform is not the best tool for this job because it has essentially been designed to do the exact opposite of what you want to do, because typically users want to avoid destroying untracked objects to avoid disrupting neighboring systems.
However, you may be able to get the effect you want with some custom programming on your part, by writing a program that does something like the following:
Run terraform show -json in all of your configuration working directories to obtain a machine-readable description of the Terraform state in each one.
Decode the JSON state descriptions to find all of the resource instances of type aws_instance and collect a set of all of their id attribute values. This is the set of instances to keep.
Call the EC2 API DescribeInstances action to retrieve a list of all of the instances that actually exist. Collect a set of all of their IDs. This is the set of instances that exist.
Set-subtract the set of instances to keep from the set of instances that exist. The result is the set of instances to destroy.
If the set of instances to destroy isn't empty, call the EC2 API's TerminateInstances action to terminate every instance ID in that set.
This description is specific to Amazon EC2 instances. The same pattern could apply to objects of any other type, but there is no general solution that will work across all object types at once because the AWS API design doesn't work that way: each object type has its own separate operations for querying which objects exist and for destroying a particular object or set of objects.

SingleInstance() not working only on cluster

I am building a solution that implements a RESTful service for interacting with metadata related to federated identity.
I have a class that is registered with Autofac like this:
builder.RegisterType<ExternalIdpStore>()
.As<IExternalIdpStore>()
.As<IStartable>()
.SingleInstance();
I have a service class (FedApiExtIdpSvc) that implements a service that is a dependency of an ASP.NET controller class. That service class has this IExternalIdpStore as a dependency. When I build and run my application from Visual Studio (which is in Debug mode), I get one instance of ExternalIdpStore injected, it's constructor only executes once. When I initiate a controller action that ends up calling a particular method of my ExternalIdpStore class, it works just fine.
When my application is built via Azure DevOps (which is in Release mode), and deployed to a Kubernetes cluster running under Linux, I initially see one call to the ExternalIdpStore class' constructor right at application startup. When I initiate the same controller action as above, I see another call to the ExternalIdpStore's constructor, and when the same method of the class is called, it fails because the data store hasn't been initialized (it's initialized from calling the class' Start method that implementes IStartable).
I have added a field to the class that gets initialized in the constructor to a GUID so I can confirm that I have two different instances when on cluster. I log this value in the constructor, in the Startup code, and in the method eventually called when the controller action is initiated. Logging is confirming that when I run from Visual Studio under Windows, there is just one instance, and the same GUID is logged in all three places. When it runs on cluster under Linux, logging confirms that the first two log entries reference the same GUID, but the log entry from the method called when the controller action is initiated shows a different GUID, and that a key object reference needed to access the data store is null.
One of my colleagues thought that I might have more than one registration. So I removed the explicit registration I showed above. The dependency failed to resolve when tested.
I am at a loss as to what to try next, or how I might add some additional logging to diagnose what is going on.
So here's what was going on:
The reason for getting two sets of log entries was that we have two Kubernetes clusters sending log entries to Splunk. This service was deployed to both. The sets of log entries were coming from pods in different clusters.
My code was creating a Cosmos DB account client, and was not setting the connection mode, so it was defaulting to direct.
The log entries that showed successful execution were for the cluster running in Azure - in Azure Kubernetes Service (AKS). Accessing the Cosmos DB account from AKS in direct connection mode was succeeding.
The log entries that were failing were running in our on-prem Kubernetes cluster. Attempting to connect to the Cosmos DB account was failing because it's on our corporate network which has security restrictions that were preventing direct connection mode from working.
The exception thrown when attempting to connect from our on-prem cluster was essentially "lost" because it was from a process running on a background thread.
modifying the logic to add a try-catch around the attempt to connect, and passing the exception back to the caller allowed logging the exception related to direct connection mode failing.
Biggest lesson learned: When something "strange" or "odd" or "mysterious" or "unusual" is happening, start looking at your code from the perspective of where it could be throwing an exception that isn't caught - especially if you have background processes!

Azure Python SDK: How to delete a list of resources that have interdependencies?

I have some code that uses the Python Azure SDK to deploy a virtual machine within a resource group. I manually provision each resource in order (a vnet and subnet if necessary, a public IP address, a NIC, and finally the VM itself).
Now, when I want to delete the VM, I can query the list of resources within the resource group and filter that list in my code to match only those resources which have a tag with the matching value.
The problem is that you can't just arbitrarily delete resources that have dependencies. For example, I cannot delete the NIC because it is in use by the virtual machine; I can't delete the OS disk because it's also in use by the VM; I can't delete the public IP address because it's assigned to the NIC; etc.
In the Azure portal you can check off a list of resources and ask the portal to delete all of them, and it handles any resource inter-dependencies for you, but it looks like this is not possible from the SDK.
Right now my only solution is to be fully aware of the path of resource creation and dependency within my code itself. I have to work backwards - first, search the list for VMs with the right tag, delete them, then search for disks with the tag, delete them, NICs, and so on down the line. But this has a lot of room for error and is not in any way reusable for other types of resources.
The only other alternative I can think of is "try to delete it and handle errors" but there's a lot of ugly edge cases I could see happening here and I'd rather take a less haphazard way of handling this, especially since we're deleting things.
TL;dr: Is there a proper way to take a list of resources and query Azure to determine which other resources depend on them? (This could be done one resource at a time but it would still be best to have it be "generic" - i.e. able to do this for any resource without necessarily knowing that resource's type up front).
The resource group contains other resources as well which are related to the same project (e.g. other VMs, a storage account, etc.) so deleting an entire resource group is NOT an option.
One of the workarounds that you can try is using Azure Powershell and tags. Try adding the tags to the resources that you wanted to delete and then use the below command to delete the resources in bulk.
$resources = az resource list --tag Key=Value| ConvertFrom-Json
foreach ($resource in $resources) {
az resource delete --resource-group $resource.resourceGroup --ids $resource.id --verbose
}
This will delete the resources regardless the location or the resource group where it has been created.

Ansible and AWS EC2 inventory

I'd like to get EC2 instances metadata with Ansible and do something with those instances based on the metadata. However, ec2_facts wants to SSH into instances in order to get the metadata.
I believe it should be possible to obtain the instances metadata without SSH connections.
Could you help me with that please?
Thank you.
There is information you can retrieve about instances using the aws API but ec_facts does not use it. What that Ansible module does specifically is fetch metadata via http://169.254.169.254/latest/meta-data/ which can only be done from the instance itself.
Some more information about what instance data you wish to fetch would be helpful to know. At this time there is no aws cloud module in core that will retrieve general information about an instance but Ansible makes it easy to write one.
Here is an example of a module that returns information about instances that match a set of tags - https://github.com/edx/configuration/blob/master/playbooks/library/ec2_lookup

How to show table like information in Nagios?

My setup is something like this.
I have multiple servers. Each server has multiple instances of same service (multi-tenant like architecture). Now I want to get status of all services running on all servers using SNMP.
The problem I am facing is, how can someone show table like information in Nagios?
i.e. when I click on any particular server, it will show me list of services. Now when I click on any service, it should again give me the list of instances of that particular service.
There's no such a feature in Nagios. You will need to set up a check for each of the services running on the monitored host.

Resources