Apache Storm 0.10.0 : Could not get my custom metrics every timeBucketSizeInSecs - apache-storm

I register my custom metrics in my bolt, code like this, context.registerMetric("et", _executedTuple, 2), this code just count the number of tuples the bolt emitted, and I register metricconsumer in my topology.
But I just get the executedTuple every ten seconds, I just think the metric should be sent every 2 seconds(timeBucketSizeInSecs).
Perhaps you know how to solve the problem!

Related

Grafana: Get last N minutes events count

I use Spring Micrometer to count every occurrence of a specific event (using counter).
How can I get the difference between the counts between now and N minutes ago? I need to how many events were occurred in the last N minutes.
I Grafana I can find only count, m1_rate, m5_rate, m15_rate and mean_rate.
It depends on your datasource. I don't know Micrometer, but looking at the docs, it seems it publishes metrics to Prometheus, so that is your datasource. If that's correct, you could use something like count_over_time(metric[1h]). That gives you the number of samples for that metric in the specified time interval. I think "m1_rate" and the others are metrics created by Micrometer.
This is what I was looking for - change of counter value for last 10 minutes.
diffSeries(sum(path.to.metric.count),timeShift(sum(path.to.metric.count),'10min',true,false))

Input Data rate in Apache Storm

I am reading text data from a file and processing it to produce results using apache storm. I want to experiment with different input data rates. I want to know, how will I change the input data rate in apache storm in this setting. Also is the input data rate is:
Number of tuples emitted by spout/Time
By default, Storm will pull tuples out of the spout as fast as possible. You can interact with this via a few settings:
topology.max.spout.pending defines how many tuples can be emitted into the topology before Storm will throttle the spout and wait for some of the tuples to be acked. By default this is uncapped.
topology.sleep.spout.wait.strategy.time.ms defines how many milliseconds Storm will pause between calls to nextTuple on the spout, if a call to nextTuple produces no output. This is 1ms by default.

Parallelism in Apache Storm with one worker node

I am trying to Parallelize my topology using Apache Storm but it gives me java.util.ConcurrentModificationException error on worker nodes if I increased the number of workers>1. It works fine with 1 worker and in local cluster. I want a way to parallelize my topology and measure the different parameters like throughput, latency, emit rate etc. using one worker node only.
Based on the stack trace you posted, it looks like Kryo is trying to serialize an ArrayList and hitting a ConcurrentModificationException. I would look for any place you emit an ArrayList and make sure that you don't modify it after you've passed it to OutputCollector.emit.
Likely the reason you're not seeing this issue when you only have one worker is that Storm only serializes emitted objects when they need to be sent to a different worker.

Wait for submitToplogy to finish

I am reading the storm applied book. I found the following code snippet in the book
LocalCluster lc = new LocalCluster()
lc.submitTopology("GitHub-commit-count-topology"), config, topology);
Utils.sleep(TEN_MINUTES)
lc.killTopology("GitHub-commit-count-topology")
lc.shutdown()
So this code will submit the topology for execution wait for fixed 10 minutes and then kill the topology. But this is odd. How can I say. submitTopology wait for it to complete and completed. kill and shutdown.
Like in Akka Streams we get Future[Done] and we just wait on that future to complete. (rather than fixed 10 minutes).
You can do this with https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/Testing.java#L376.
The reason this isn't used in some cases is that it requires every spout in the topology to implement the CompletableSpout interface https://github.com/apache/storm/blob/4137328b75c06771f84414c3c2113e2d1c757c08/storm-client/src/jvm/org/apache/storm/testing/CompletableSpout.java.
Most Storm spouts never reach a point where they're "done" (since it's a stream processing framework, not a batch processing framework), so there's no way to tell when the topology is finished. For example, if you're consuming messages from a Kafka topic, the producers may at any point add more messages to the topic, so how will the consumer determine it is finished consuming?
CompletableSpout exists mostly to ease testing, because it's then possible for a spout to say whether it is done. The completeTopology method I linked can then use this extra feature to tell whether all spouts in the topology are "done", and can stop the topology after that.
If the spout you're using in a test doesn't implement CompletableSpout (which most spouts don't), there's no way to tell when the topology is finished in general. In many cases you can still do better than the example you linked, e.g. if my topology is supposed to write 10 messages to a queue in the test, I can make the test end once 10 messages have been written to the queue.
To relate to Akka streams, I'm not really familiar with them, but looking at the introductory documentation, you could consider CompletableSpouts to be similar to bounded Sources (eg. a Source(1 to 100)), while "normal" spouts are unbounded Sources (e.g. a Source.repeat(1)).

How would you emit storm data after a period of time has lapsed?

For example, lets say you were using storm to aggregate web visit start and end dates. A session starts with the first visit from a user and ends after 30 minutes of inactivity from that same user. This data is being streamed into storm in realtime as its collected. How would you tell storm to emit data after that 30 minutes of inactivity?
I am not sure but you can look for TOPOLOGY_TICK_TUPLE_FREQ_SECS properties in storm. As found in this article
Tick tuples: It’s common to require a bolt to “do something” at a fixed interval, like flush writes to a database. Many people have been using variants of a ClockSpout to send these ticks. The problem with a ClockSpout is that you can’t internalize the need for ticks within your bolt, so if you forget to set up your bolt correctly within your topology it won’t work correctly. 0.8.0 introduces a new “tick tuple” config that lets you specify the frequency at which you want to receive tick tuples via the “topology.tick.tuple.freq.secs” component-specific config, and then your bolt will receive a tuple from the __system component and __tick stream at that frequency.
You can also found the sample code to configure spouts or bolt to receive the tick tuple with a specific interval.

Resources