How to detect that federated prometheus stopped delivering metrics? - federation

I have two "slave" prometheus severs, one in each of my kubernetes clusters. I have one centralised prometheus for federation and alerting.
Sometimes, it happens that a "slave" stops delivering metrics. How to detect it? How to create an alert that catches such a situation.
Unfortunately, prometheus always sees its federated peers as UP. No matter what.

We need a bit more information here. If up is 1 then everything is okay. The real question is why you're getting a successful scrape with no data. Have you tried debugging that?
I'd also suggest alerting as deep in the stack as you can, see https://www.robustperception.io/federation-what-is-it-good-for/

Related

Make Zipkin (or any open-tracing framework) work with existing "trace id"

A bit of background:
We have around 10 Spring boot microservices, which communicate with each other via kafka. The logs of each microservice are sent to Kibana, and in case of any errors, we have to sift through Kibana logs.
The good thing is: at the start of any flow, a message-id is generated by one of our microservices, and that is propagated to all the others as part of the message transfer (which happens through kafka), so we can search for the message-id in the logs, and we can see the footprint of that flow across all our microservices.
The bad part: having to sift through tons of logs to get a basic idea of where things broke and why.
Now the Question:
So I was wondering if we can have some distributed tracing implemented, maybe through Zipkin (or some other open-tracing framework) that can work with the message-id that our ecosystem already produces, instead of generating a new one ?
Thank you for your time :)
I'm not entirely sure if that's what you mean, but you can use Jeager https://www.jaegertracing.io/ which checks if trace-id already exist in the invocation metadata and in it generate child trace id. Based on all trace ids call diagrams are generated

Linkerd proxies metrics reset

Good evening,
I’m a student from the university of Rome Tor Vergata. I’m currently working on my master thesis that involves the use of Linkerd.
Very briefly the thesis is about implementing a totally distributed root cause localization system for microservices architectures.
In the metrics collection phase I'm facing an issue with Linkerd since I’m not using Prometheus, but manually scraping metrics from proxies through the /metrics endpoint.
I can’t understand how or when do Linkerd’s proxies reset the various metrics they collect.
Does anybody know if they have a timer? Or is there a way to make them reset metrics after the scraping?
Thanks in advance for any help anyone will give me.
The metrics are stored in memory by the Linkerd proxy as soon as the proxy process starts running.
Most of the metrics are buckets for histograms whose main purpose is to view the data over time, so there isn't a way to reset them and they don't reset themselves.
You could write prometheus queries to select windows of time where you would reset the metrics or you could restart the containers and write queries to filter the metrics on the newer workloads.

Jaeger integration Effort in Microservices in nodejs

I am analysing at a very hight level how much effort would it be for jaeger integration in nodejs microservices.
Does it require code changes or only deployment. and if code changes is required, is code changes needed in first service (i.e. api-gateway) or all the services need to have code changes.
I would really appreciate if someone can give a rough idea of tasks and effort.
Great question and a very popular one too. In short, yes, code changes are required. Not just in one service but in all the services that a request will go through. You need to instrument all services to get continuous traces that will be able to tell you the story of a request as it travels through the system.

How to set up monitoring for Redmine tasks/issues in Grafana?

I plan to set up monitoring for Redmine, with the help of which I can see man-hours spent on tickets, time taken to complete a ticket etc to monitor the productivity of my team. I want to see all of these using Graphana. As of now I think using Prometheus and exposing the Metrics but not sure how. (Might have to create an exporter I think, but not sure if that would work). So basically how can this be possible?
A Prometheus exporter is simply an HTTP server that sits next to your target (Redmine in your case, although I have no experience with it) and whenever it gets a /metrics request it does one or more API calls to the target (assuming Redmine provides an API to query the numbers you need) and returns said numbers as Prometheus metrics with names, labels etc.
Here are the Prometheus clients (that help expose metrics in the format accepted by Prometheus) for Go and Java (look for simpleclient_http or simpleclient_servlet). There is support for many other languages.
Adding on to #Alin's answer to expose Redmine metrics to Prometheus. You would need to install an exporter.
https://github.com/mbeloshitsky/redmine_prometheus.git
Here is a redmine plugin available for prometheus.
You can get the hours and all the data you need through Redmine Rest APIs. Write a little program to fetch and update the data in Graphite or Prometheus. You can perform this task using sensu through creating a metric script in python,ruby or Perl. Next all you have to do is Plotting the graphs. Well thats another race :P
RedMine guide: http://www.redmine.org/projects/redmine/wiki/Rest_api_with_python

Communication between Heroku apps

I've build a distributed system consisting of several web-services and some web applications consuming them.
They are all hosted on Heroku.
Is there some way for request between these applications to be done "inside heroku" without going through the web.
Something analog to using localhost.
You are maybe in luck: such a feature has currently reached the experimental phase.
Let me take a moment to underscore that: this feature may disappear or change at any time. It's not supported, but bug reports are appreciated. Don't build a bank with it. Don't get yourself in a position to be incredibly sad if severe problems are found that render it unshippable and it's aborted.
However, it is still cool, and here it is: containerized-network
You can use, for example, the pub-sub interface of any of the hosted Redis solutions. Or any of the message brokers (IronMQ, RabbitMQ) to pass messages.

Resources