Using KafkaSpout, ack-ing a tuple twice causes timeouts? - apache-storm

My topology uses the default KafkaSpout implementation. In some very controlled testing, I noticed the spout was failing tuples even though none of my bolts were failing any tuples and I was certain all messages were being fully processed well within my configured timeout.
I also noticed that (due to some sub-classing structure with my bolts), one of my bolts was ack-ing tuples twice. When I fixed this, the spout stopped failing tuples.
Sorry that this is more than a sanity check than a question, but does this make sense? I don't see why ack-ing the same tuple instance twice would cause the Spout to register timeouts, but it seems it was in my case?

It does make sense.
Storm tracks all of the acks (direct and indirect) for a tuple emitted by a spout in an odd but effective manner. I'm not sure of the exact algorithm, but it entails repeatedly XOR'ing what was originally the spout-emitted tuple ID with the ID's of subsequent anchored tuple ID's. each of those subsequent ID's is XOR'ed twice - once when the tuple is anchored and once when the tuple is acked. When the results of an XOR is all zero's, then the assumption is that each anchor was matched by an ack and the original spout-emitted tuple has finished processing.
By ack'ing some tuples more than once, you made it seem that some of the spout-emitted tuples were not finished completely (because an odd number of XOR's will never zero out).

Related

Tuples failing at the spout, and seems they are not even reaching the Bolt

I have a topology running for a few days now and it started failing tuples from last couple of days. From the logs it seems that the tuples are not reaching the bolts, attached is the Storm UI screenshot.
I am ack'ing the tuples in finally in my code, so no case of un'acked tuples, and the timeout is set at 10sec, which is quite high than the time shown on the UI.
Any Hints ?enter image description here
The log you're seeing is simply the Kafka spout telling you that it has fallen too far behind, and it has started skipping tuples.
I believe only acked tuples count for the complete latency metric https://github.com/apache/storm/blob/a4afacd9617d620f50cf026fc599821f7ac25c79/storm-client/src/jvm/org/apache/storm/stats/SpoutExecutorStats.java#L54. Failed tuples don't (how would Storm know what the actual latency is for tuples that time out), so the complete latency you're seeing is only for the initial couple of acked tuples.
I think what's happening is that your tuples are reaching the bolt, and then either you're not acking them (or acking them more than once), or the tuples are taking too long to process so they time out while queued up for the bolt. Keep in mind that the tuple timeout starts when the spout emits the tuple, so time spent in the bolt's input queue counts. Since your initial couple of tuples are taking a while to process, I think the bolt queue gets backed up with tuples that are already timed out. The bolt doesn't discard tuples that are timed out, so the queued timed out tuples are preventing fresh tuples from being processed in time.
I'd raise the tuple timeout, and also cap the number of pending tuples by setting topology.max.spout.pending to whatever you think is reasonable (something like the number of tuples you think you can process within the timeout)

With Storm, is there significant benefit to anchoring my tuples in this case?

During my stream my tuples do not split in anyway. One action is eventually performed out for each tuple in.
I can still fail them if they run into some sort of exception that they might conquer if replayed by my KafkaSpout. Though, I don't know how my spout knows which tuple to replay when they're not anchored, but in testing it seems to replay the right one. Is this expected, the KafkaSpout implementation tracks tuples/messages in some way I'm not aware of? Am I possibly anchoring an not realizing it (my bolts extend BaseRichBolt)? Possibly I'm just mistaken that it replays the correct one?
But if manually failing does work, then I believe the only benefit I get from anchoring is that my tuple will be replayed when it times out -- which I'm not sure is worth the overhead of anchoring.
Am I correct about this? Is there some other significant benefit to anchoring in this case?
BaseRichBolt does not do any anchoring automatically (BaseBasicBolt would do this). Thus, the behavior you describe should only work if you have simple Spout -> Bolt topology. For deeper topologies, ie, Spout -> Bolt1 -> Bolt2 and no anchoring in Bolt1, failing of tuples in Bolt2 cannot work.
Using KafkaSpout each tuple emitted gets a MessageId assigned, thus fault-tolerance mechanism is activated. Thus, each tuple must get acked in the first Bolt receiving those tuples; otherwise, the tuples time-out eventually. Tuples emitted in Bolt1 should get anchored (otherwise, those tuples get not tracked, cannot fail---neither manually in Bolt2 or per time-out---and cannot get replayed in case of failure).
Thus, anchoring is a pure fault-tolerance mechanism. You should actually always anchor tuples because anchoring itself does not enable fault-tolerance; assigning MessageIds in Spout does enable it. If a Bolt processes a tuple that does not have an ID assigned, the anchoring call will do "nothing" and the overhead of an additional method call is tiny. Therefore, adding anchoring code is usually a good choice, because the Bolt can be used with or without fault-tolerance enabled (depending if the Spout assigns messaged IDs or not). If you omit the anchoring code, fault-tolerance will break in this Bolt and downstream tuples cannot get recovered on failure.

Apache Storm: what happens to a tuple when no bolts are available to consume it?

If it's linked to another bolt, but no instances of the next bolt are available for a while. How long will it hang around? Indefinitely? Long enough?
How about if many tuples are waiting, because there is a line or queue for the next available bolt. Will they merge? Will bad things happen if too many get backed up?
By default tuples will timeout after 30 seconds after being emitted; You can change this value, but unless you know what you are doing don't do it (topology.message.timeout.secs)
Failed and timeout out tuples will be replayed by the spout, if the spout is reading from a reliable data source (eg. kafka); this is, storm has guaranteed message processing. If you are codding your own spouts, you might want to dig deep into this.
You can see if you are having timeout tuples on storm UI, when tuples are failing on the spout but not on the bolts.
You don't want tuples to timeout inside your topology (for example there is a performance penalty on kafka for not reading sequential). You should adjust the capacity of your topology process tuples (this is, tweak the bolt parallelism by changing the number of executors) and by setting the parameter topology.max.spout.pending to a reasonable conservative value.
increase the topology.message.timeout.secs parameter is no real solution, because soon or late if the capacity of your topology is not enough the tuples will start to fail.
topology.max.spout.pending is the max number of tuples that can be waiting. The spout will emit more tuples as long the number of tuples not fully processed is less than the given value. Note that the parameter topology.max.spout.pending is per spout (each spout has it's internal counter and keeps track of the tuples which are not fully processed).
There is a deserialize-queue for buffering the coming tuples, if it hangs long enough, the queue will be full,and tuples will be lost if you don't use the ack function to make sure it will be resent.
Storm just drops them if the tuples are not consumed until timeout. (default is 30 seconds)
After that, Storm calls fail(Object msgId) method of Spout. If you want to replay the failed tuples, you should implement this function. You need to keep the tuples in memory, or other reliable storage systems, such as Kafka, to replay them.
If you do not implement the fail(Object msgId) method, Storm just drops them.
Reference: https://storm.apache.org/documentation/Guaranteeing-message-processing.html

Apache Storm Tuple Replay Count

I'm using Apache Storm for parallel processing. I'd like to detect when the tuple is on its last replay count so that if it fails again then the tuple can be moved to a dead letter queue.
Is there a way to find the replay count from within the Bolt? I'm not able to find such a field within the tuple.
The reason I'm looking for the last replay count is to iron out our topology so that it is more resilient failures caused by bugs and downstream service outages. When the bug/downstream issue has been resolved the tuples can be reprocessed from the dead letter queue. However I'd like to place the tuples on the dead letter queue only on its last and final replay.
There are multiple possible answers to this question:
Do you use low level Java API to define your topology? If yes, see here: Storm: Is it possible to limit the number of replays on fail (Anchoring)?
You can also use transactional topologies. The documentation is here: https://storm.apache.org/documentation/Transactional-topologies.html
Limiting the number of replays implies counting the number of replays and that's a requirement to get this done. However, Storm does not support a dead letter queue or similar natively. You would need to use a reliable external distributed storage system (maybe Kafka) and put the tuple there if the replay count exceed your threshold. And in your spout, you need to check periodically for tuple in this external storage. If they are stored there "long enough" (whatever that means in your application), the spout can try re-processing.

How do I implement this topology in Storm?

I'm new to Storm, so be gentle :-)
I want to implement a topology that is similar to the RollingTopWords topology in the Storm examples. The idea is to count the frequency of words emitted. Basically, the spouts emit words at random, the first level bolts count the frequency and pass them on. The twist is that I want the bolts to pass on the frequency of a word only if its frequency in one of the bolts exceeded a threshold. So, for example, if the word "Nathan" passed the threshold of 5 occurrences within a time window on one bolt then all bolts would start passing "Nathan"'s frequency onwards.
What I thought of doing is having another layer of bolts which would have the list of words which have passed a threshold. They would then receive the words and frequencies from the previous layer of bolts and pass them on only if they appear in the list. Obviously, this list would have to be synchronized across the whole layer of bolts.
Is this a good idea? What would be the best way of implementing it?
Update: What I'm hoping to achieve a situation where communication is minimized i.e. each node in my use case is simulated by a spout and an attached bolt which does the local counting. I'd like that bolt to emit only words that have passed a threshold, either in the bolt itself or in another one. So every bolt will have to have a list of words that have passed the threshold. There will be a central repository that will hold the list of words over the threshold and will communicate with the bolts to pass that information.
What would be the best way of implementing that?
That shouldn't be too complicated. Just don't emit the words until you reach the threshold and in the meantime keep them stored in a HashMap. That is just one if-else statement.
About the synchronization - I don't think you need it because when you have these kind of problems (with counting words) you want one and only one task to receive a specific word. The one task that receives the word (e.g. "Nathan") will be the only one emitting its frequency. For that you should use fields grouping.

Resources