I am using GKE platform to implement a Kubernetes scheduler. I am using Prometheus Grafana to monitor the applications.
For implementing a scheduler in golang, I need to get the metrics as an input to the scheduler.
Please suggest me some methods to do so.
Also please suggest proper documentations so that I can easily understand the things.
I am a newbie, so I don't know anything it.
Your help will be appreciated.
First, I would encourage you to read some relevant documentation about Kubernetes monitoring architecture which explains a lot of useful information about main concepts of Kubernetes metrics. Since you have used Prometheus as a main monitoring cluster agent, you might be operating with some specific metrics exposed by the application in your Kubernetes cluster infrastructure; therefore when you plan to implement custom scheduler it should be the main factor to adapt these metrics in order to define the further scheduler behavior. The good example to achieve this goal can be Sysdig monitoring tool, as it can perform automatic collection of Prometheus metrics and propagate these metrics across applications in the cluster.
You can also visit Custom scheduler project on GitHub based on Sysdig monitoring metrics and driven by open-source community enthusiasts.
Related
Does a datadog agent generate metrics?
How does it collect metrics that the host's app generates?
Does it intrude the app code environment to collect metrics?
Let's say that the app is a Spring Boot app. It has a set of metrics already being generated by Micrometer and is exposed on the /metrics endpoint. How does a datadog agent fit in, here?
Let's say that the app is the same this time. But, does not have micrometer enabled.
How would datadog fit in here?
Would it have the capability to generate metrics from this app? If so, how does it do the same? Furthermore, in doing so, does it access the application's source code? Or gets into the runtime and adds bytecode to generate metrics by observing the events?
Let's say that, we have an application running on the host, that already generates metrics and can ship it to a network accessible storage. Can datadog be used just to collect the data and visualize it? Without an agent?
Does datadog only collect metrics that are exposed by the host's app?
The reason I am curious to know these aspects is to analyze the vulnerability of the host with this respect, understand the added overhead in terms of infrastructural resources, understand the performance overhead and the cost involved.
At the same time, a stronger question that stands is, why datadog?
Any thoughts on Dynatrace in the same respect?
I use Prometheus to gather k8s' resources.
The resource data pipeline is as follows:
k8s -> Prometheus -> Java app -> Elasticsearch -> (whghl) Java app
Here I have a question.
Why use Prometheus?
Wouldn't Prometheus not be necessary if it was stored in DB like mine?
Whether I use Elasticsearch or MongoDB, wouldn't I need Prometheus?
It definitely depends on what exactly you are trying to achieve by using these tools. In general, the scope of usage is quite different.
Prometheus is specifically designed for metrics collection, system monitoring and creating alerts based on those metrics. That's why it is the better choice if it is primarily required to pull metrics from services and run alerts on them.
Elasticsearch in its turn is a system with wider scope, as it is used to store and search all data types, perform different types of analytics of this data - and mostly it is used as log analysis system. But it also can be configured for monitoring, though it is not particularly made for it, unlike Prometheus.
Both tools are good to use, but Prometheus provides more simplicity in setting up monitoring for Kubernetes.
I am new in Google PubSub. I am using GoLang for the client library.
How to see the opencensus metrics that recorded by the google-cloud-go library?
I already success publish a message to Google PubSub. And now I want to see this metrics, but I can not find these metrics in Google Stackdriver.
PublishLatency = stats.Float64(statsPrefix+"publish_roundtrip_latency", "The latency in milliseconds per publish batch", stats.UnitMilliseconds)
https://github.com/googleapis/google-cloud-go/blob/25803d86c6f5d3a315388d369bf6ddecfadfbfb5/pubsub/trace.go#L59
This is curious; I'm surprised to see these (machine-generated) APIs sprinkled with OpenCensus (Stats) integration.
I've not tried this but I'm familiar with OpenCensus.
One of OpenCensus' benefits is that it loosely-couples the generation of e.g. metrics from the consumption. So, while the code defines the metrics (and views), I expect (!?) the API leaves it to you to choose which Exporter(s) you'd like to use and to configure these.
In your code, you'll need to import the Stackdriver (and any other exporters you wish to use) and then follow these instructions:
https://opencensus.io/exporters/supported-exporters/go/stackdriver/#creating-the-exporter
NOTE I encourage you to look at the OpenCensus Agent too as this further decouples your code; you reference the generic Opencensus Agent in your code and configure the agent to route e.g. metrics to e.g. Stackdriver.
For Stackdriver, you will need to configure the exporter with a GCP Project ID and that project will need to have Stackdriver Monitor enabled (and configured). I've not used Stackdriver in some months but this used to require a manual step too. Easiest way to check is to visit:
https://console.cloud.google.com/monitoring/?project=[[YOUR-PROJECT]]
If I understand the intent (!) correctly, I expect API calls will then record stats at the metrics in the views defined in the code that you referenced.
Once you're confident that metrics are being shipped to Stackdriver, the easiest way to confirm this is to query a metric using Stackdriver's metrics explorer:
https://console.cloud.google.com/monitoring/metrics-explorer?project=[[YOUR-PROJECT]]
You may wish to test this approach using the Prometheus Exporter because it's simpler. After configuring the Prometheus Exporter, when you run your code, it will be create an HTTP server and you can curl the metrics that are being generated on:
http://localhost:8888/metrics
NOTE Opencensus is being (!?) deprecated in favor of a replacement solution called OpenTelemetry.
So I am currently running dual clusters for data processing, one is a Kubernetes clusters and another is a Hadoop cluster.
K8s cluster is taken care of in terms of monitoring since it was quite easy to deploy Prometheus and Grafana on it.
For the Hadoop cluster however, I am still looking of a good way to do that.
The goal is to have a unified monitoring solution, so I though it would be a good idea to go with Prometheus since I am already familiar with it, but looks like it's not straight-forward.
Hadoop by default exposes some metrics through HTTP API but those metrics are not "Prometheus-friendly".
Would appreciate if you can explain how I can achieve this.
i suggest you look at this,
https://github.com/marcelmay/hadoop-hdfs-fsimage-exporter
in most cases when the application does not expose prometheus metrics you can use an exporter there are a lot of them.
they collect the metrics and expose them in a Prometheus friendly manner.
I want to scale out my EC2 instances on AWS. For this I have been suggested to use the Sensu framwork.
I want to scale out the instance based on its CPU usage. For testing I have configured Sensu on both Windows and Ubuntu(V.Box), I'm running a client on Ubuntu by following this example. My CPU data is successfully passed to RabbitMQ.
Now I'm wondering how I can use that data in the Sensu server so that I can scale in or scale out? Any suggestion will be appreciated.
In case it matters, I will use this with Opscode Chef.
The easiest way to achieve your goal would be to connect the available components together (which will still require writing some code, see below) and refrain from adding custom solutions as much as possible:
Amazon EC2 offers Auto Scaling, which is in turn be driven by Metrics collected via Amazon CloudWatch. So metrics are key here, and that's exactly what Sensu is all about, see e.g. Sensu and Graphite, which covers two approaches for pushing metrics from Sensu to Graphite:
Remember: think of Sensu as the "monitoring router". While we are
going to show how to push metrics to Graphite, it is just as easy to
push metrics to any other system – Librato, Cube, OpenTSDB, etc. In
fact, it would not be difficult at all to push metrics to multiple
graphing backends in a fanout manner. [emphasis mine]
Your metrics are available in the Sensu server already, so you'll need to push them into CloudWatch now (just like explained for Graphite in the article above) and attach respective Auto Scaling policies to these in turn.
The currently available metrics handlers for Sensu are targeting Graphite and Librato indeed, so you'd need to implement such a Sensu Handler for Publishing Custom Metrics into CloudWatch (be sure to share it, it will definitely be widely used over time :)
Good luck!