Good evening,
I’m a student from the university of Rome Tor Vergata. I’m currently working on my master thesis that involves the use of Linkerd.
Very briefly the thesis is about implementing a totally distributed root cause localization system for microservices architectures.
In the metrics collection phase I'm facing an issue with Linkerd since I’m not using Prometheus, but manually scraping metrics from proxies through the /metrics endpoint.
I can’t understand how or when do Linkerd’s proxies reset the various metrics they collect.
Does anybody know if they have a timer? Or is there a way to make them reset metrics after the scraping?
Thanks in advance for any help anyone will give me.
The metrics are stored in memory by the Linkerd proxy as soon as the proxy process starts running.
Most of the metrics are buckets for histograms whose main purpose is to view the data over time, so there isn't a way to reset them and they don't reset themselves.
You could write prometheus queries to select windows of time where you would reset the metrics or you could restart the containers and write queries to filter the metrics on the newer workloads.
Related
Does a datadog agent generate metrics?
How does it collect metrics that the host's app generates?
Does it intrude the app code environment to collect metrics?
Let's say that the app is a Spring Boot app. It has a set of metrics already being generated by Micrometer and is exposed on the /metrics endpoint. How does a datadog agent fit in, here?
Let's say that the app is the same this time. But, does not have micrometer enabled.
How would datadog fit in here?
Would it have the capability to generate metrics from this app? If so, how does it do the same? Furthermore, in doing so, does it access the application's source code? Or gets into the runtime and adds bytecode to generate metrics by observing the events?
Let's say that, we have an application running on the host, that already generates metrics and can ship it to a network accessible storage. Can datadog be used just to collect the data and visualize it? Without an agent?
Does datadog only collect metrics that are exposed by the host's app?
The reason I am curious to know these aspects is to analyze the vulnerability of the host with this respect, understand the added overhead in terms of infrastructural resources, understand the performance overhead and the cost involved.
At the same time, a stronger question that stands is, why datadog?
Any thoughts on Dynatrace in the same respect?
Or I am really bad at searching or there is no detailed comparison between App Insights and ELK stack ?
All monitoring is going to be used for simple Web API, there going to be tons of end points but user traffic should not be too high.
So my question.. Is there any general points/differences when choosing between ELK and App Insights, personally never had a chance to set up any of those, but before setting up test environment would be nice to know in advance, what to expect/look for.
I'm from App Insights team. I think the link provided by #rickvdbosch in a comment gives quite good perspective. It is 1+ years old at this point, so, some items regarding App Insights evolved since then.
I think App Insights and ELK are quite different offerings. The former is managed offering (you can set it up within couple minutes), focused on very broad range of out-of-the-box experiences (collecting incoming/outgoing requests, exceptions, smart alerts, availability monitoring, analytics, live metrics, application map, end-to-end transactions across apps).
My understanding of ELK is that it has very powerful UI visualization and powerful dashboards (though there are adapters for Kibana to work with Azure Monitor). For scenarios where there is a need to store a lot of data (highly loaded apps with adaptive sampling still store limited amount of data) ELK solution might be cheaper to run.
Final decision was to use ELK as servers already have all the configuration, because other team uses it and mainly because logging will need a lot customization.
I plan to set up monitoring for Redmine, with the help of which I can see man-hours spent on tickets, time taken to complete a ticket etc to monitor the productivity of my team. I want to see all of these using Graphana. As of now I think using Prometheus and exposing the Metrics but not sure how. (Might have to create an exporter I think, but not sure if that would work). So basically how can this be possible?
A Prometheus exporter is simply an HTTP server that sits next to your target (Redmine in your case, although I have no experience with it) and whenever it gets a /metrics request it does one or more API calls to the target (assuming Redmine provides an API to query the numbers you need) and returns said numbers as Prometheus metrics with names, labels etc.
Here are the Prometheus clients (that help expose metrics in the format accepted by Prometheus) for Go and Java (look for simpleclient_http or simpleclient_servlet). There is support for many other languages.
Adding on to #Alin's answer to expose Redmine metrics to Prometheus. You would need to install an exporter.
https://github.com/mbeloshitsky/redmine_prometheus.git
Here is a redmine plugin available for prometheus.
You can get the hours and all the data you need through Redmine Rest APIs. Write a little program to fetch and update the data in Graphite or Prometheus. You can perform this task using sensu through creating a metric script in python,ruby or Perl. Next all you have to do is Plotting the graphs. Well thats another race :P
RedMine guide: http://www.redmine.org/projects/redmine/wiki/Rest_api_with_python
I have two "slave" prometheus severs, one in each of my kubernetes clusters. I have one centralised prometheus for federation and alerting.
Sometimes, it happens that a "slave" stops delivering metrics. How to detect it? How to create an alert that catches such a situation.
Unfortunately, prometheus always sees its federated peers as UP. No matter what.
We need a bit more information here. If up is 1 then everything is okay. The real question is why you're getting a successful scrape with no data. Have you tried debugging that?
I'd also suggest alerting as deep in the stack as you can, see https://www.robustperception.io/federation-what-is-it-good-for/
I was wondering if there is a tool to keep track of application performance. What I have in mind is a tool that will listen for updates and register performance metrics published by an application. i.e. time to serve a request, time a certain operation took to finish. And this tool would then aggregate the data and measure performance trends.
If you want to measure your application from outside, then you can use RRDtool to collect the data.
You can use slamd for webapp written in Java.
For Django use hotshot.
Search for profiler + your language, framework
Take a look at HP SiteScope. It's ability to drive the system with a Web User Script, to monitor the metrics on the backend, even to the extent of creation of custom shell scripts and database queries, plus the ability to add logic for report/alert against these combined data sets appears to be what you need.
Other mechanisms that you might consider would be a roll your own service using CURL to push information in, queries to the systems involved to pull metrics or database information and then your own interface for alerting and reporting.
Then it becomes a cost question, can you roll the level of functionality for less money than you can purchase an already existing solution on the open market.
Ref:
HP SiteScope Wiki Page