I use Prometheus to gather k8s' resources.
The resource data pipeline is as follows:
k8s -> Prometheus -> Java app -> Elasticsearch -> (whghl) Java app
Here I have a question.
Why use Prometheus?
Wouldn't Prometheus not be necessary if it was stored in DB like mine?
Whether I use Elasticsearch or MongoDB, wouldn't I need Prometheus?
It definitely depends on what exactly you are trying to achieve by using these tools. In general, the scope of usage is quite different.
Prometheus is specifically designed for metrics collection, system monitoring and creating alerts based on those metrics. That's why it is the better choice if it is primarily required to pull metrics from services and run alerts on them.
Elasticsearch in its turn is a system with wider scope, as it is used to store and search all data types, perform different types of analytics of this data - and mostly it is used as log analysis system. But it also can be configured for monitoring, though it is not particularly made for it, unlike Prometheus.
Both tools are good to use, but Prometheus provides more simplicity in setting up monitoring for Kubernetes.
Related
Does a datadog agent generate metrics?
How does it collect metrics that the host's app generates?
Does it intrude the app code environment to collect metrics?
Let's say that the app is a Spring Boot app. It has a set of metrics already being generated by Micrometer and is exposed on the /metrics endpoint. How does a datadog agent fit in, here?
Let's say that the app is the same this time. But, does not have micrometer enabled.
How would datadog fit in here?
Would it have the capability to generate metrics from this app? If so, how does it do the same? Furthermore, in doing so, does it access the application's source code? Or gets into the runtime and adds bytecode to generate metrics by observing the events?
Let's say that, we have an application running on the host, that already generates metrics and can ship it to a network accessible storage. Can datadog be used just to collect the data and visualize it? Without an agent?
Does datadog only collect metrics that are exposed by the host's app?
The reason I am curious to know these aspects is to analyze the vulnerability of the host with this respect, understand the added overhead in terms of infrastructural resources, understand the performance overhead and the cost involved.
At the same time, a stronger question that stands is, why datadog?
Any thoughts on Dynatrace in the same respect?
I am new in Google PubSub. I am using GoLang for the client library.
How to see the opencensus metrics that recorded by the google-cloud-go library?
I already success publish a message to Google PubSub. And now I want to see this metrics, but I can not find these metrics in Google Stackdriver.
PublishLatency = stats.Float64(statsPrefix+"publish_roundtrip_latency", "The latency in milliseconds per publish batch", stats.UnitMilliseconds)
https://github.com/googleapis/google-cloud-go/blob/25803d86c6f5d3a315388d369bf6ddecfadfbfb5/pubsub/trace.go#L59
This is curious; I'm surprised to see these (machine-generated) APIs sprinkled with OpenCensus (Stats) integration.
I've not tried this but I'm familiar with OpenCensus.
One of OpenCensus' benefits is that it loosely-couples the generation of e.g. metrics from the consumption. So, while the code defines the metrics (and views), I expect (!?) the API leaves it to you to choose which Exporter(s) you'd like to use and to configure these.
In your code, you'll need to import the Stackdriver (and any other exporters you wish to use) and then follow these instructions:
https://opencensus.io/exporters/supported-exporters/go/stackdriver/#creating-the-exporter
NOTE I encourage you to look at the OpenCensus Agent too as this further decouples your code; you reference the generic Opencensus Agent in your code and configure the agent to route e.g. metrics to e.g. Stackdriver.
For Stackdriver, you will need to configure the exporter with a GCP Project ID and that project will need to have Stackdriver Monitor enabled (and configured). I've not used Stackdriver in some months but this used to require a manual step too. Easiest way to check is to visit:
https://console.cloud.google.com/monitoring/?project=[[YOUR-PROJECT]]
If I understand the intent (!) correctly, I expect API calls will then record stats at the metrics in the views defined in the code that you referenced.
Once you're confident that metrics are being shipped to Stackdriver, the easiest way to confirm this is to query a metric using Stackdriver's metrics explorer:
https://console.cloud.google.com/monitoring/metrics-explorer?project=[[YOUR-PROJECT]]
You may wish to test this approach using the Prometheus Exporter because it's simpler. After configuring the Prometheus Exporter, when you run your code, it will be create an HTTP server and you can curl the metrics that are being generated on:
http://localhost:8888/metrics
NOTE Opencensus is being (!?) deprecated in favor of a replacement solution called OpenTelemetry.
So I am currently running dual clusters for data processing, one is a Kubernetes clusters and another is a Hadoop cluster.
K8s cluster is taken care of in terms of monitoring since it was quite easy to deploy Prometheus and Grafana on it.
For the Hadoop cluster however, I am still looking of a good way to do that.
The goal is to have a unified monitoring solution, so I though it would be a good idea to go with Prometheus since I am already familiar with it, but looks like it's not straight-forward.
Hadoop by default exposes some metrics through HTTP API but those metrics are not "Prometheus-friendly".
Would appreciate if you can explain how I can achieve this.
i suggest you look at this,
https://github.com/marcelmay/hadoop-hdfs-fsimage-exporter
in most cases when the application does not expose prometheus metrics you can use an exporter there are a lot of them.
they collect the metrics and expose them in a Prometheus friendly manner.
I am using GKE platform to implement a Kubernetes scheduler. I am using Prometheus Grafana to monitor the applications.
For implementing a scheduler in golang, I need to get the metrics as an input to the scheduler.
Please suggest me some methods to do so.
Also please suggest proper documentations so that I can easily understand the things.
I am a newbie, so I don't know anything it.
Your help will be appreciated.
First, I would encourage you to read some relevant documentation about Kubernetes monitoring architecture which explains a lot of useful information about main concepts of Kubernetes metrics. Since you have used Prometheus as a main monitoring cluster agent, you might be operating with some specific metrics exposed by the application in your Kubernetes cluster infrastructure; therefore when you plan to implement custom scheduler it should be the main factor to adapt these metrics in order to define the further scheduler behavior. The good example to achieve this goal can be Sysdig monitoring tool, as it can perform automatic collection of Prometheus metrics and propagate these metrics across applications in the cluster.
You can also visit Custom scheduler project on GitHub based on Sysdig monitoring metrics and driven by open-source community enthusiasts.
I am new to prometheus, and so I am not sure if high availability is part of Prometheus data store tsdb. I am not looking into something like having two prometheus server instances scraping data from the same exporter as that has high chance of having two tsdb data store which are out of sync.
It really depends on your requirements.
Do you need highly available alerting on your metrics? Prometheus can do that.
Do you need a highly available monitoring system that contains the last few hours of data for operational triage? Two prometheus instances are pretty good for that too.
Do you need long-term storage of timeseries data? Prometheus is not designed to accomplish this on its own. Either use the remote write functionality of prometheus to ship data to another TSDB that supports redundant storage (InfluxDB and Clickhouse are pretty promising here) but you are on the hook for de-duping data. Alternatively, consider Cortex.
For Kubernetes setup Using kube-prometheus (prometheus-operator), you can configure it using values.
and including thanos Would help in this situation
There is prometheus-postgresql-adapter that allows you to use PostgreSQL / TimescaleDB as a remote storage. The adapter enables multiple Prometheus instances (HA setup) to write to a single remote storage, so you have one source of truth. Recently, I've published a blog post about it [How to manage Prometheus high-availability with PostgreSQL + TimescaleDB] (https://blog.timescale.com/blog/prometheus-ha-postgresql-8de68d19b6f5/).
Disclaimer: I am one of the engineers behind the adapter