Query resources using UID in kubernetes client-go - go

I'm using the k8s client-go library to mainly perform CRUD operations for CRDs.
Also, in kubernetes we have a unique UID for each resource.
So is there a way to query resources in kubernetes using this UID. If so can you please share any resources, code snippets which helpful for this??

Related

Deploying jaeger on AWS ECS with Elasticsearch

How should I go about deploying Jaeger on AWS ECS with Elasticsearch as backend? Is it a good idea to use the Jaeger all in one image or should I use separate images?
While I didn’t find any official jaeger reference to this, I think the jaeger all in one image is not intended for use in production. It makes one container a single point of failure, making it better to use separate containers for each jaeger component(if one is down from some reason - others can continue to operate).
I have recently written a blog post about hosting jaeger on AWS with AWS Elasticsearch (OpenSearch) service. While it is done with all-in-one, it is still useful to get the general idea of how to go about this.
Just to generally outline the process (described in detail in the post):
Create AWS Elasticsearch cluster
Create an ECS Cluster (running on ec2)
Create an ECS Task Definition, configured with a jaeger all-in-one image with the elasticsearch url from the step 1
Create an ECS Service that runs the created task definition
Make sure security groups on your EC2 allow access to jaeger ports as described here
Send spans to your jaeger endpoint via OpenTelemetry SDK
View your spans via the hosted jaeger UI (your-ec2-url:16686)
The all in one is a useful tool in development to test your work locally.
For deployment it is very limiting. Ideally to handle a potentially large volume of traffic you will want to scale parts of your infrastructure.
I would recommend deploying multiple jaeger-collectors, configured to write to the ES cluster. Then you can configure jaeger-agents running as a sidecar to each app or service broadcasting telemetry info. These agents can be configured to forward to one of a list of collectors adding some extra resilience.

How does kubectl configure CRDs?

There're already some CRDs defined in my kubernetes cluster.
kubectl can create/update/delete the resources well.
When I tried to do those operations with program, the way I found by searching is to generate code with below tool:
https://github.com/kubernetes/code-generator
I'm wondering why kubectl can do it out-of-box without generating code for CRDs.
Is it necessary to generate code in order to add or delete a CRD resource?
Thanks!
First lets understand what CRD is.
The CustomResourceDefinition API resource allows you to define custom resources. Defining a CRD object creates a new custom resource with a name and schema that you specify. The Kubernetes API serves and handles the storage of your custom resource. The name of a CRD object must be a valid DNS subdomain name.
This frees you from writing your own API server to handle the custom resource, but the generic nature of the implementation means you have less flexibility than with API server aggregation.
Why would one create Custom Resources:
A resource is an endpoint in the Kubernetes API that stores a collection of API objects of a certain kind. For example, the built-in pods resource contains a collection of Pod objects.
A custom resource is an extension of the Kubernetes API that is not necessarily available in a default Kubernetes installation. It represents a customization of a particular Kubernetes installation. However, many core Kubernetes functions are now built using custom resources, making Kubernetes more modular.
Custom resources can appear and disappear in a running cluster through dynamic registration, and cluster admins can update custom resources independently of the cluster itself. Once a custom resource is installed, users can create and access its objects using kubectl, just as they do for built-in resources like Pods.
So to answer your question, if you need a functionality that is missing from Kubernetes you need to create it yourself using CRDs. Without it cluster won't know what you want and how to get it.
If you are looking for examples of usage of Kubernetes Client-go you can find them on the official GitHub Client-go/examples

Deploying services in GKE k8s cluster using Golang k8s client

I'm able to create a GKE cluster using the golang container lib here.
Now for my golang k8s client to be able to deploy my k8s deployment files there, I need to get the kubeconfig from the GKE cluster. However I can't find the relevant api for that in the container lib above. Can anyone please point out what am I missing ?
As per #Subhash suggestion I am posting the answer from this question:
The GKE API does not have a call that outputs a kubeconfig file (or
fragment). The specific processing between fetching a full cluster
definition and updating the kubeconfig file are implemented in python
in the gcloud tooling. It isn't part of the Go SDK so you'd need to
implement it yourself.
You can also try using kubectl config set-credentials (see
this) and/or see if you can vendor the libraries that implement
that function if you want to do it programmatically.

Elastic search with Google Big Query

I have the event logs loaded in elasticsearch engine and I visualise it using Kibana. My event logs are actually stored in the Google Big Query table. Currently I am dumping the json files to a Google bucket and download it to a local drive. Then using logstash, I move the json files from the local drive to the elastic search engine.
Now, I am trying to automate the process by establishing the connection between google big query and elastic search. From what I have read, I understand that there is a output connector which sends the data from elastic search to Google big query but not vice versa. Just wondering whether I should upload the json file to a kubernete cluster and then establish the connection between the cluster and Elastic search engine.
Any help with this regard would be appreciated.
Although this solution may be a little complex, I suggest some solution that you use Google Storage Connector with ES-Hadoop. These two are very mature and used in production-grade by many great companies.
Logstash over a lot of pods on Kubernetes will be very expensive and - I think - not a very nice, resilient and scalable approach.
Apache Beam has connectors for BigQuery and Elastic Search, I would definitly perform this using DataFlow so you don´t need to implement a complex ETL and staging storage. You can read the data from BigQuery using BigQueryIO.Read.from (take a look to this if performance is important BigQueryIO Read vs fromQuery) and load it into ElasticSearch using ElasticsearchIO.write()
Refer this how read data from BigQuery Dataflow
https://github.com/GoogleCloudPlatform/professional-services/blob/master/examples/dataflow-bigquery-transpose/src/main/java/com/google/cloud/pso/pipeline/Pivot.java
Elastic Search indexing
https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/dataflow-elasticsearch-indexer
UPDATED 2019-06-24
Recently this year was release BigQuery Storage API which improve the parallelism to extract data from BigQuery and is natively supported by DataFlow. Refer to https://beam.apache.org/documentation/io/built-in/google-bigquery/#storage-api for more details.
From the documentation
The BigQuery Storage API allows you to directly access tables in BigQuery storage. As a result, your pipeline can read from BigQuery storage faster than previously possible.
I have recently worked on a similar pipeline. A workflow I would suggest would either use the mentioned Google storage connector, or other methods to read your json files into a spark job. You should be able to quickly and easily transform your data, and then use the elasticsearch-spark plugin to load that data into your Elasticsearch cluster.
You can use Google Cloud Dataproc or Cloud Dataflow to run and schedule your job.
As of 2021, there is a Dataflow template that allows a "GCP native" connection between BigQuery and ElasticSearch
More information here in a blog post by elastic.co
Further documentation and step by step process by google

Connect hadoop cluster to mutiple Google Cloud Storage backets in multiple Google Projects

It is possible, to connect my Hadoop cluster to multiple Google Cloud Projects at once ?
I can easly use any Google Storage bucket in single Google Project via Google Cloud Storage Connector as explained in this thread Migrating 50TB data from local Hadoop cluster to Google Cloud Storage. But i can't find any documentation or example how to connect to two or more Google Cloud Project from single map-reduce job. Do You have any suggestion/trick ?
Thanks a lot.
Indeed, it is possible to connect your cluster to buckets from multiple different projects at once. Ultimately, if you're using the instructions for using a service-account keyfile, the GCS requests are performed on behalf of that service-account, which can be treated more-or-less like any other user. You can either add the service account email your-service-account-email#developer.gserviceaccount.com to all the different cloud projects owning buckets you want to process, using the permissions section of cloud.google.com/console and simply adding that email address like any other member, or you can set GCS-level access to add that service-account like any other user.

Resources