Apache Storm tuple timeout questions - apache-storm

I'm trying to understand the state of the topology in case of tuple processing timeout (not in Trident mode)
Lets assume that during a processing of a tuple in some bolt the timeout threshold was reached. In that case the spout emits the initial tuple again (with same message id as i understand). Now lets say the Bolt finishes to process the tuple and emits and acks the tuple.
In that scenario :
Will the failed tuple still continue to be processed by the topology even though the spout emitted new one initial tuple ?
If so how would the acker's DAG of tuples will look like (Since there is a new DAG created with same initial tuple id) what will happen to the previous original DAG?
what will happen when the acker receives ack and emit with anchor
ids of the previous DAG ?

1: Yes, the failed tuple continues. The reason for this is that it would be too expensive to try to stop the failed tuple from continuing, as the spout would need to tell all the bolts about the failure.
2: I think there's a small misunderstanding here. When the spout emits the tuple, the message id is not what Storm uses to track that tuple DAG/tree internally. Instead, the spout executor generates a random id (call it rootId), and locally stores the mapping of rootId -> messageId. The message id never leaves the spout executor, and isn't propagated to the bolts.
When the spout executor sends the tuple onward, it includes the rootId. The rootId is what is used by the acker and bolts to identify the tuple tree.
Finally when the tree is fully acked, or a tuple fails, the spout executor is told that the relevant rootId succeeded or failed, and it looks up the original messageId in its local mapping.
Since a new emit with the same messageId gets a new rootId, there is no relation between the failed and new tuples. They are considered completely separate by Storm.
I simplified the above a bit for clarity, in order to handle a spout emitting to multiple bolts, there's another set of random ids (anchorId) involved. Conceptually you can think the situation where you have
spout -> bolt1
-> bolt2
as being handled as if the topology were
spout -> splitterBolt -> bolt1
-> bolt2
3: Let's say your tuple has timed out. The spout executor has been told that the rootId has failed. When that happens, the spout executor calls spout.fail(msgId), and then deletes the mapping in the rootId -> messageId map.
When the acker receives the ack, it might send the ack on to the spout, if the tree is fully acked. When the spout receives the ack, it has nothing matching the rootId stored, so the ack is ignored.
If you're interested in taking a look at the code, it can be found at https://github.com/apache/storm/blob/b48e10559b65e834884d59887b30fc86d2988c20/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutOutputCollectorImpl.java#L109. The rootId -> messageId mapping is called pending.

Related

[Storm]What happens to other tuples anchored to the same message id if one of them fails?

If one of the tuples anchored to a message id fails, will the other tuples process completely or they're stopped by storm?
The other tuples anchored to the same message id will continue processing as normal. Storm will fail the message id at the spout immediately, which will likely cause the spout to retry the root tuple.
The reason for this behavior is that it would be difficult/expensive to make the spout try to tell all the bolts that might be processing a tuple anchored to a failed message id that the tuple has already failed.

Storm Topology: proper way to ack when two bolts have same source bolt

I'm fairly new to Storm, and recently changed my bolts to inherit from IRichBolt
instead of BaseBasicBolt, meaning I am now in charge of acking and failing
a tuple according to my own logic.
My topology looks like this:
Bolt A emits the same tuple to Bolts B and C, each persist data to Cassandra.
These operations are not idempotent, and include an update to two different counter column families.
I am only interested in failing the tuple and replaying it in certain exceptions from Cassandra (not read/write timeouts, only QueryConsistency or Validation exception).
The problem is that in case bolt B fails, the same tuple is replayed from the spout and is emitted again to bolt C, which already succeeded to persist the its data, creating false data.
I've tried to understand how exactly acking is done (from reading: http://www.slideshare.net/andreaiacono/storm-44638254) but failed to understand
what happens in the situation I described above.
The only way I figured to solve this correctly is to either create another spout with the same input source: Spout 1 -> bolt A -> bolt B, and Spout 1' -> Bolt A' -> Bolt C', or either to persist the data for both column family in the same Batch Statement that is done in Bolts B and C by combining them into one.
Is this correct or am I missing something? And Is there another possible solution to properly ack these tuples?
Thanks.
You didn't say how long you want to wait to retry an failed update in bolt B or C, but instead of outright failing the tuple in bolt B, you could add some more streams. Add a scorpion-tail output stream from bolt B back to the same bolt B. If an update in bolt B fails, write the tuple to the scorpion-tail output stream so it comes right back as input into bolt B again, just from a second stream. You could enrich the tuple to hold a timestamp so your processing logic on bolt B for the new stream could look at the last attempted time and if enough time hasn't passed you could write it out to the scorpion-tail stream again. Of course you'd do the same thing for bolt C.
If you want to wait a long time to retry the tuple (long in Storm terms), you could replace those scorpion-tail streams with Kafka topics along with the requisite spouts.

How to handle ACKing in storm with multiple bolts reading from the same spout

My topology looks like this :
Data_Enrichment_Persistence_Topology
So basically the problem I am trying to solve here is that every time any issue comes in the Stop or Load service bolts, and a tuple fails , it replays and the spout re emits it. This makes the Cassandra bolt re process the tuple and rewrite data.
I can not make the tuples in the load and stop bolts unanchored as i need them to be replayed in case of any failure. However I only want to get the upper workflow replayed.
I am using a KafkaSpout to emit data ( it is emitting it on the " default" stream). Not sure how to duplicate the streams at the Kafka Spout's emit level.
If I can duplicate the streams the replay on any of of the two will only re emit the message on a particular stream right at the spout level leaving the other stream untouched right?
TIA!
You need to use two output streams in your Spout -- one for each downstream pass. Furthermore, you emit each tuple to both streams (using different message-id).
Thus, if one fails, you can reply this tuple to just this stream.

Do I need Kafka to have a reliable Storm spout?

As I understand things, ZooKeeper will persist tuples emitted by bolts so if a bolt crashes (or a computer with the bolt crashes, or the entire cluster crashes), the tuple emitted by the bolt will not be lost. Once everything is restarted, the tuples will be fetched from ZooKeeper, and everything will continue on as if nothing bad ever happened.
What I don't yet understand is if the same thing is true for spouts. If a spout emits a tuple (i.e., the emit() function within a spout is executed), and the computer the spout is running on crashes shortly thereafter, will that tuple be resurrected by ZooKeeper? Or do we need Kafka in order to guarantee this?
P.S. I understand that the tuple emitted by the spout must be assigned a unique ID in the call to emit().
P.P.S. I see sample code in books that uses something like ConcurrentHashMap<UUID, Values> to track which spouted tuples have not yet been acked. Is this somehow automatically persisted with ZooKeeper? If not, then I shouldn't really be doing that, should I? What should I being doing instead? Using Kafka?
Florian Hussonnois answered my question thoroughly and clearly in this storm-user thread. This was his answer:
Actually, the tuples aren't persisted into "zookeeper". If your
"spout" emits a tuple with a unique id, it will be automatically
follow internally by storm (i.e ackers) . Thus, in case the emitted
tuple comes to fail because of a bolt failure, Storm invokes the
method 'fail' on the origin spout task with the unique id as argument.
It's then up to you to re-emit the failed tuple.
In sample codes, spouts use a Map to track which tuples are fully
processed by your entire topology in order to be able to re-emit in
case of a bolt failure.
However, if the failure doesn't come from a bolt but from your spout,
the in memory Map will be lost and your topology will not be able to
remit failed tuples.
For a such scenario you can rely on Kafka. In fact, the Kafka Spout
store its read offset into zookeeper. In that way, if a spout task
goes down it will be able to read its offset from zookeeper after
restarting.

Optional stream in Storm topology

We have a fairly simple storm topology with one head ache.
One of our bolts can either find the data it is processing to be valid and every thing carries on down the stream as normal or it can find it to be invalid but fixable. In which case we need to send it for some additional processing.
We tried making this step part of the topology with a separate bolt and stream.
declarer.declareStream(NORMAL_STREAM, getStreamFields());
declarer.declareStream(ERROR_STREAM, getErrorStreamFields());
Followed by some thing like the following at the end of the execute method.
if(errorOutput != null) {
collector.emit(ERROR_STREAM, input, errorOutput);
}
else {
collector.emit(NORMAL_STREAM, input, output);
}
collector.ack(input);
This does work however it has a breaking effect of causing all of the tuples that do not go down this error path to fail and get re-sent by the spout endlessly.
I think this is because the error bolt can not send acks for messages it doesn't receive but the acker thing waits for all the bolts in a topology to ack before sending the ack back to the spout. At the very least taking out the error processing bolt causes every thing to get acked back to the spout correctly.
What is the best way to achieve some thing like this?
It's possible that the error bolt is slower than you suspect, causing a backup on error_stream which, in turn, causes a backup into your first bolt and finally causing tuples to start timing out. When a tuple times out, it gets resent by the spout.
Try:
Increasing the timeout config (topology.message.timeout.secs),
Limiting the number of inflight tuples from the spout (topology.max.spout.pending) and/or
Increasing the parallelism count for your bolts

Resources