Opentracing tracing siblings - opentracing

I have a webserver which does some async processing by pushing data to a kafka consumer and returns a response. I want to be able to trace the webrequest + trace of the kafka consumer in a single trace. Is it possible to do it?
I tried to start the kafka consumer as the child of the first span but since the first span end when the api request ends, tracing tools show wrong times for trace. I know I'm doing it the wrong way.
Can someone please point me in the direction for implementing such a requirement.

You need to use a FOLLOWS_FROM reference instead of a CHILD_OF reference for the span tracing in the Kafka consumer.
Specification

Related

spring cloud sleuth - how to propagate trace id and span id to a listener when usign spring cloud aws s3

I have a bean which is MessageHandler to handle an incoming message. The message handler is of type org.springframework.integration.aws.outbound.S3MessageHandler, which uploads the message to amazon s3. The issue is that, the operations of this message handler is performed in a different thread. How can I ensure that I can track the transaction id is propagated all the way to the thread performing this transaction?
DEBUG [app-name,,] 22540 --- [anager-worker-1]
Also attached to this message handler is a progress listener of type com.amazonaws.services.s3.transfer.internal.S3ProgressListener. The callbacks to this listener are performed within a different thread altogether. I need the trace ids in this listener too.
INFO [app-name,,] 22540 --- [callback-thread]
You may use MDC logging feature and log your trace id and span id.
The key problem that you have several threads for processing that leads to question how to correlate the logs between them.. so,
Put traceid+span+unique message id into MDC of thread that processing request
Put uniq message id into MDC of thread that MessageHelper uses
Modify your appenders to print traceid, span, unique message id into logs
Such fields could be well indexed by ELK (or any similar tool, even grep).
Then, by searching in logs by uniq message id you'll find all logs record + threads names and other details like traceid and spanid
Sleuth can do this for you, you need to check the docs how to use its API: https://docs.spring.io/spring-cloud-sleuth/docs/current/reference/html/using.html#using-creating-and-ending-spans
As you can see there, you can create a Span and you can also create a Scope (in SpanInScope ). The Scope is responsible for the context propagation, e.g.: handling the MDC for you so you don't need to do anything with the MDC.
What you need to do though is properly using this API so the context is propagated. This is usually done by instrumenting the ExuecutorService/CompletionService and/or Runnable/Callable (whichever you use). Here's how to implement such a thing:
Get the current Span in the caller thread (tracer.currentSpan())
Use this span and create a Scope (and a new Span if you need) in the "other" thread (inside of the thread-pool)
You don't need to do this on your own, Sleuth has a TraceRunnable (this is doing exactly what I described above) and a TraceableExecutorService.

Workflow modeling problem in Spring Integration

I have a problem creating/modeling integration flow for the next global use case:
Input to the system is some kind of Message. That message goes
through Splitter and Transformer Endpoint and after that on
ServiceActivator where that transformed message is processed. This
use case is clear for me.
Confusion occurs because of the next part. After the ServiceActivator
finishes processing I need to took the base Message (message from the
beginning of first part) again and put it in other processing, for example again through Splitter and Transformer. How can
I model that use case? Can I return the message payload to that base
value? Is there some component that could help me?
Hope I describe it well.
Your use-case sounds more like a PublishSubscribeChannel: https://docs.spring.io/spring-integration/docs/current/reference/html/core.html#channel-implementations-publishsubscribechannel. So, you are going to have several subscribers (splitters) for that channel and the same input message is going to be processed in those independent sub-flows. You even can do that in parallel if you configure an Executor into that PublishSubscribeChannel.
Another way, if you can do that in parallel and you still need some result from that ServiceActivator to be available alongside with an original message for the next endpoint or so, then you can use a HeaderEnricher to store an original message in the headers ad get access to it whenever you need in your flow: https://docs.spring.io/spring-integration/docs/current/reference/html/message-transformation.html#header-enricher

What is the most efficient way to know that a Kafka event is visible in a K-Table?

We use Kafka topics as both events and a repository. Using the kafka-streams API we define a simple K-Table that represents all the events in the topic.
In our use case we publish events to the topic and subsequently reference the K-Table as the backing repository. The main issue is that the published events are not immediately visible on the K-Table.
We tried transactions and exactly once semantics as described here (https://kafka.apache.org/26/documentation/streams/core-concepts#streams_processing_guarantee) but there is always a delay we cannot control.
Publish Event
Undetermined amount of time
Published Event is visible in the K-Table
Is there a way to eliminate the delay or otherwise know that a specific event has been consumed by the K-Table.
NOTE: We tried both partition and global tables with similar results.
Thanks
Because Kafka is an asynchronous system the observed delay is expected and you cannot do anything to avoid it.
However, if you publish a message to a topic, the KafkaProducer allows you to pass in a Callback to the send() method and the callback will be executed after the message was written to the topic providing the record's metadata like topic, partition, and offset.
After Kafka Streams processed messages, it will eventually commit the offsets (you can configure the commit interval, too). Thus, you can know if the message is in the KTable after the offset was committed. By default, committing happens every 30 seconds only and it's not recommended to use a very short commit interval because it implies large overhead. Thus, I am not sure if this would help for your case, as it seem you want a more timely "response".
As an alternative, you can also disable caching on the KTable and use a toStream().process() step -- after each update to the KTable, the changelog stream provided by toStream() will contain the record and you can access the record metadata (including its offset) in the Processor via the given ProcessorContext object. Thus should also allow you to figure out, when the record is available in the KTable.

Null correlation not allowed. Maybe the CorrelationStrategy is failing?

I am using spring integration,and I am using default correlation strategy, that is i am not explicitly writing code for correlation strategy,everything works fine till the splitter, after the splitter there is a service activator which does some processing and then puts the message into a channel from which the aggregator has to pick it,but the aggregator doesnt pick it, so i put an interceptor to find out what was going on and found out that before the message is put into the aggregator channel, aggregation related headers like correlation id etc are present,but once its put into the channel the headers are lost.Now i am not sure why the aggregator or the channel before that is losing the headers.Any help would be much appreciated.
UPDATE:- i using an spliier then activator then another splitter then an activator then an aggregator and then another aggregator... The code below is for inner splitter and aggregator combination
Thanks for your help.
I was able to finally solve this.
The problem was i was passing along org.json.JSONOBject from and to the spring integration components.
Now the JSONObject is not serialized, and i guess splitter and aggregator components only work with serialized objects. The simplest way was to conver the JSONObjects to string by calling toString() method on them.It would have been so much easier if the stack trace told me that i was using a non-serialized object instead of telling me "Null correlation not allowed. Maybe the CorrelationStrategy is failing?"
I have removed my code that i had put here to be safe.

ActiveMQ: Is MessageConsumer's selector process on the broker or client side?

Could someone please confirm if I'm right or wrong on this. It seems to me that "selector" operation is done within MessageConsumer implementation. (i.e. ALL messages are still dispatched from Message Broker to MessageConsumer and then "selector" operation are performed against those messages). The problem occurs when we have a bunch of messages that we are not interested in (i.e. not match our selector), those messages will eventually fill up MessageConsumer's internal queue due to prefetch or cache limit. As a result, we will not be able to receive any new messages, particularly the ones we're interested in with the selector.
So, is there a way to configure AMQ to perform the selector operation on MessageBroker side? Should I start looking at "interceptor" and create my own BrokerPlugin? Any advice on how to workaround this issue?
I'm really appreciate any answer.
Thanks,
Soonthorn A.
Selectors actually are applied at the broker, not on the client side. If your selector is sparse and the destination sees a lot of traffic its likely that the broker has not paged in messages that match the selector and you consumer won't see any matches until more messages are consumed from the destination.
The issue lies in the Destination Policy in play for your broker. By default the broker will only page in 200 message for a browser to avoid using up all available memory and avoid impacting overall performance. You can increase this number via your own DestinationPolicy in activemq.xml, see the documentation page here.

Resources