OpenTelemetry Java Auto Instrumentation - Trace context changing when passing over multiple kafka topics - open-telemetry

I have a process flow like below between 3 Microservices using kafka as an event broker.
Service-1 (publish)-> (topic:1) ->(consume) Service-2 (publish)->(topic:2)->(consume) Service-3
For distributed tracing opentelemetry-javaagent.jar is used for auto instrumentation with jaeger as the backend. During runtime the traces are disjointed and at any point in time trace correlation is only maintained between 2 services, as shown below
Service-1 produces with TraceId:1 and Service-2 consumes with TraceId:1 but when Service-2 tries to produce to Service-3 the TraceId changes(eg:TraceId:2) and Service-3 consumes with TraceId:2. Therefore for any service I could see only 2 spans.
Can someone help/guide here and let me know how to propagate the same trace context between all the 3 Microservices. The following versions are followed for binaries and kafka broker version is 2.12-2.8.0
opentelemetry-javaagent.jar(version:1.18.0)
kafka-clients-3.0.0.jar
spring-cloud-stream-3.0.8.RELEASE.jar
spring-kafka-2.5.3.RELEASE.jar
Thanks!

Related

intermittent issue with kafka (aws msk) consumer

We are facing a strange issue in only one of our environment (with same consumer app).
Basically, it is observed that suddenly a lag starts to build up with only one of the topics on kafka broker (it has multiple topics), with 10 consumer members under a single consumer group.
Even after multiple restarts, adding another pod of consumer application, changing defaults configuration properties (max poll records, session timeout) so far have NOT helped much.
Looking for any suggestions, advice on how to possibly debug the issue (we tried enabling apache logs, cloud watch etc, but so we only saw that regular/periodic rebalancing is happening, even for very low load of 7k messages waiting for processing).
Below are env details:
App - Spring boot app on version 2.7.2 Platform
AWS Kafka - MSK
Kafka Broker - 3 brokers (version 2.8.x)
Consumer Group - 1 with 15 members (partition 8, Topic 1)

Spring cloud Sleuth starts a new trace instead of continuing spans in a single trace

I have 4 spring-boot applications (A, B, C and D).
The lifecycle of a transaction is as follows :
Application A is a kafka streams application and it ultimately produces to a topic which is
consumed by Application B.
Application B then consumes from the topic using #KafkaListener, does some processing and then produces to IBMMQ queue using spring's jmsTemplate.
Application C which is a #JMSListener consumes from the above queue and produces to another
queue using spring's JMSTemplate.
Application D which is again a #JmsListener consumes from the above queue and then produces to a kafka topic, which the again consumed by Application A
Now for a single transaction I would expect a single trace across all four application, but instead I get
One Trace starting from application A to application B (where it produces to IBM MQ)
One trace starting from Application C and ending at Application A
I would have uploaded the pictures to show the zipkin spans, but for some reason I am not able to do so.
All the above applications are Spring boot applications and they utilize spring-cloud-sleuth for producing transactions traces. I am relying on spring boot's autoconfiguration and these are the properties that I have set in all the applications:
zipkin:
enabled: ${ZIPKIN_ENABLED:false}
sender:
type: kafka
baseUrl: ${ZIPKIN_URL:http://localhost:9411}
service:
name: ${spring.application.name}
sleuth:
messaging:
kafka:
enabled: true
jms:
enabled: true
I am not able to understand what's exactly happening here. Why the spans are scattered across 2 traces and not one?
I am using spring-boot 2.3.3 and spring-cloud-dependencies Hoxton.SR8.
So it was application B which was not passing the header along. Turns out that the queue uri had a property targetClient which was set to 1. The uri is something like
queue:///DESTINATION_QUEUE?targetClient=1
Now I am not an IBM MQ expert by far, but the documentation states that setting this property to 1 means that Messages do not contain an MQRFH2 header. I toggled it to 0 and voila, all spans fall into place.

How to find the processing time of Kafka messages?

I have an application running Kafka consumers and want to monitor the processing time of each message consumed from the topic. The application is a Spring boot application and exposes Kafka consumer metrics to Spring Actuator Prometheus endpoint using micrometre registry.
Can I use kafka_consumer_commit_latency_avg_seconds or kafka_consumer_commit_latency_max_seconds to monitor or alert?
Those metrics have nothing to do with record processing time. spring-kafka provides metrics for that; see here.
Monitoring Listener Performance
Starting with version 2.3, the listener container will automatically create and update Micrometer Timer s for the listener, if Micrometer is detected on the class path, and a single MeterRegistry is present in the application context. The timers can be disabled by setting the ContainerProperty micrometerEnabled to false.
Two timers are maintained - one for successful calls to the listener and one for >failures.

Liveness/Readiness set of health indicators for Spring Boot service running on top of Kafka Streams

How health indicators should be properly configured for Spring Boot service running on top of Kafka Streams with DB connection? We use Spring Cloud Streams and Kafka Streams binding, Spring-Data JPA, Kubernetes as a container hypervisor. We have let say 3 service replicas and 9 partitions for each topic. A typical service usually joins messages from two topics and persist data in a database and publish data back to another kafka topic.
After switching to Spring Boot 2.3.1 and changing K8s liveness/readiness endpoints to the new ones:
/actuator/health/liveness
/actuator/health/readiness
we discovered that by default they do not have any health indicators included.
According to documentation:
Actuator configures the "liveness" and "readiness" probes as Health
Groups; this means that all the Health Groups features are available
for them. (...) By default, Spring Boot does not add other Health
Indicators to these groups.
I believe that this is the right approach, but I have not tested that:
management.endpoint.health.group.readiness.include: readinessState,db,binders
management.endpoint.health.group.liveness.include: livenessState,ping,diskSpace
We try to cover the following use cases:
rolling update: not available consumption slot (idle instance) when new replica is added
stream has died (runtime exception has been thrown)
DB is not available during container start up / when service is running
broker is not available
I have found a similar question, however I believe the current one is specifically related to Kafka services. They are different in it's nature from REST services.
Update:
In spring boot 2.3.1 binders health indicator checks if streams are in RUNNING or REBALANCING state for Kafka 2.5 (before only RUNNING), so I guess that rolling update case with idle instance is handled by its logic.

Spring Boot Micro Service Tracing Options

I am having below requirement for which is there any open source library will cover all of them.
1.We are building a distributed micro service architecture with Spring Boot.Which includes more than 100 micro services.
2.There is a lot if inter micro service communications possible to achieve single transaction.
3.We want to trace every micro service call and the trace should provide following information.
a.Transaction ID/Trace ID
b. Back end transaction status-HTTP status for REST.Like wise for SOAP as well.
c.Time taken for that call.
d.Request and Response payload.
Currently we are achieving this using indigenous tracing frame work.Is there any open source project will handle all this without any coding from developer.I know we have few options with spring Boot Cloud Zipkin,Seluth etc does this handle above requirements.
My project has similar requirements to yours. IMHO, Spring-cloud-sleuth + Zipkin work well in my case.
For any inter microservices communication, we are using Kafka, and Spring-cloud-sleuth + zipkin has no problem to trace all the call, from REST -> Kafka -> More Kafka -> REST.
To enable Kafka Tracing, just simply add
spring:
sleuth:
propagation-keys: some-key
sampler:
probability: 1
messaging:
kafka:
enabled: true
We are also using Azure ApplicationInsights to do centralized logging, which is well integrated with Spring Cloud.
Hope above give you some confidence of using Sleuth + Zipkin.

Resources