Existing internal topic has invalid partitions - apache-kafka-streams

When starting our Kafka Streams application in a test setup with just one Kafka broker we see the following error roughly 1 out of 15 runs:
org.apache.kafka.streams.errors.StreamsException: Existing internal topic alarm-message-streams-by-organization-repartition has invalid partitions: expected: 32; actual: 12. Use 'kafka.tools.StreamsResetter' tool to clean up invalid topics before processing.
When we see the error above the actual number of partitions vary (expected is 32, actual is above 0 and below 32).
We are executing org.apache.kafka.streams.KafkaStreams#cleanUp before calling org.apache.kafka.streams.KafkaStreams#start. The Kafka broker is started without data (using https://hub.docker.com/r/wurstmeister/kafka/) for every test run.
When looking at the log for the Kafka broker we see the following:
2018-10-22 18:41:31,373] INFO Topic creation Map(
alarm-message-streams-by-organization-repartition-19 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-22 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-0 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-7 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-23 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-1 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-24 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-2 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-30 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-5 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-21 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-8 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-14 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-15 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-6 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-16 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-31 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-25 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-9 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-20 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-29 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-13 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-26 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-17 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-4 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-10 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-3 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-11 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-12 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-28 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-27 -> ArrayBuffer(42),
alarm-message-streams-by-organization-repartition-18 -> ArrayBuffer(42)
) (kafka.zk.AdminZkClient)
It looks like the topic is created with the expected number of partitions (32). Later, in the same log, it looks like there is a request to create the topic again. We don't know why that happens but at least the request still contains the expected number of partitions (32):
[2018-10-22 18:43:29,851] INFO [Admin Manager on Broker 42]: Error processing create topic request for topic alarm-message-streams-by-organization-repartition with arguments (numPartitions=32, replicationFactor=1, replicasAssignments={}, configs={cleanup.policy=delete, segment.bytes=52428800, segment.ms=600000, retention.ms=9223372036854775807, segment.index.bytes=52428800}) (kafka.server.AdminManager)
org.apache.kafka.common.errors.TopicExistsException: Topic 'alarm-message-streams-by-organization-repartition' already exists.
We have never seen this happen in non-test where we are running with 6 Kafka brokers. However, we are running a significantly higher number of test runs than deploys to non-test.
Note: It is not always the same topic that is causing the error.
The error is causing flakiness in our test setup so we would like to understand why it happens and deal with it. Can anybody provide some insights into this Kafka Streams behavior?
We are using Kafka and Kafka Streams 2.0.0.

It seems that incomplete/incorrect metadata is received from the Kafka cluster (ie your single broker). On startup (or to be more precise, in each rebalance), Kafka Streams check if internal topics exist with the expected number of partitions. If a topic does not exist, it's created (this should only happen once during the live time of an application). If it exists with the correct number of partitions, the topic is used. If the topic exists with incorrect number of partitions, the exception you report is thrown.
Calling KafkaStreams#cleanup() should not have any impact here. It's not the same as StreamResetter that you can call via bin/kafka-streams-application-reset.sh (cf. https://kafka.apache.org/20/documentation/streams/developer-guide/app-reset-tool.html)
I have no idea at the moment, what the root cause for the issue could be though, ie, why Kafka Streams received incorrect topic metadata. Hope this helps.

Related

Multiple elasticsearch sinks for a single flink pipeline

My requirement is to send the data to a different ES sink (based on the data). Ex: If the data contains a particular info send it to sink1 else send it to sink2 etc(basically send it dynamically to any one sink based on the data). I also want to set parallelism separately for ES sink1, ES sink2, Es sink3 etc.
-> Es sink1 (parallelism 4)
Kafka -> Map(Transformations) -> ES sink2 (parallelism 2)
-> Es sink3 (parallelism 2)
Is there any simple way to achieve the above in flink ?
My solution: (but not satisfied with it)
I could come up with a solution but there are intermediate kafka topics which i write to (topic1,topic2,topic3) and then have separate pipelines for Essink1,Essink2 and ESsink3. I want to avoid writing to these intermediate kafka topics.
kafka -> Map(Transformations) -> Kafka topics (Insert into topic1,topic2,topic3 based on the data)
Kafka topic1 -> Essink1(parallelism 4)
Kafka topic2 -> Essink2(parallelism 2)
Kafka topic3 -> Essink3(parallelism 2)
You can use a ProcessFunction [1] with side outputs [2] to split the stream n ways, and then connect each side output stream to the appropriate sink. And then call setParallelism() [3] on each sink.
[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html#the-processfunction
[2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/side_output.html
[3] https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html#operator-level

Spring kafka consumer stops receiving message

I have a spring microservice using kafka.
Here are consumer 5 config properties :
BOOTSTRAP_SERVERS_CONFIG -> <ip>:9092
KEY_DESERIALIZER_CLASS_CONFIG -> StringDeserializer.class
VALUE_DESERIALIZER_CLASS_CONFIG -> StringDeserializer.class
GROUP_ID_CONFIG -> "Group1"
MAX_POLL_INTERVAL_MS_CONFIG -> Integer.INT_MAX
It has been observed that when microservice is restarted , then kafka consumer stops receiving messages. Please help me in this.
I believe your max.poll.interval.ms is the issue. It is set to 24days!! This represents the time the consumer is given to process the message. The the broker will hang for that long when the processing thread dies! Try setting it to a smaller value than Integer.INT_MAX, for example 30 seconds 30000ms.

MQ - COA correlation issue after change of remote destination from Queue to Topic

Seeking for an advise on the below faced COA correlation issue.
Background: there is an application A which is feeding data to an application B via MQ (nothing special - remote queue def pointing to the local q def on remote QM). Where the sending app A is requesting COAs. That is a stable setup working for years:
App A -> QM.A[Q1] -channel-> QM.B[Q2] -> App B
Here:
Q1 is a remote q def pointing to the Q2.
Problem: there is an application C which requires exactly the same data feed which A is sending to B via MQ. => it is required to duplicate data feed considering the following constraint.
Constraint: neither code, nor app config of applications A and B could be changed - duplication of the data feed from A to B should be transparent for applications A and B - A puts messages to the same queue Q1 on QM.A; B gets messages from the same queue Q2 on the QM.B
Proposed solution: duplicate the feed on the MQ layer by creation of the Topic/subscirbers configuration on the QM of the app B:
App A -> QM.A[Q1] -channel-> QM.B[QA->T->{S2,S3}->{Q2,Q3}] -> {App B, QM.C[Q4] -> App C}
Here:
Q1 - has the rname property updated to point to the QA for Topic
instead of Q2
QA - Queue Alias for Topic T
T - Topic
S2, S3 - subscribers publishing data to the Q2 and Q3
Q2 - unchanged, the same local queue definition where App B consumes from
Q3 - remote queue definition pointing to the Q4
Q4 - local queue definition on the QM.C, the queue with copy of messages sent from A to B
With this set up duplication of the messages from the app A to the app B and C works fine.
But ... there is an issue.
Issue: application A is not able to correlate COAs and that is the problem.
I'm not sure if app A is not able to correlate COAs at all, or (what is more likely guess) it is not able to correlate additional COAs e.g. from the QM.C
Any idea or advise is very much appreciated.

Nifi Processor with multiple inputs, triggers only after receiving certain flow files

I am receiving tar files from FTP and saving it to HDFS after extracting it. So My current pipeline looks like this.
ListFTP -> FetchFTP -> UnpackContent -> PutHDFS
This tar contains 10 files and hence for a single tar file, 10 flow files are generated. My requirement is to trigger another job after 3 particular files are stored in HDFS. Which Processor should I use or is there any other approach to this problem using Nifi?
I have done it using RouteOnAttribute, Notify and Wait processors. The complete flow is described below.
ListFTP -> FetchFTP -> UnpackContent -> PutHDFS -> RouteOnAttribute -> Notify -> Wait ->
RouteOnAttribute: Reroute the 3 required files to a seperate queue using the following.
${filename:equals('file1.tsv'):or(${filename:equals('file2.tsv')}):or(${filename:equals('file3.tsv')})}
Notify: This is used to notify that a particular file is received. Set the Signal Counter Name as ${filename}.
Notify: This processor will wait till all the 3 files are received and after that will trigger a flowfile.

carbon-tagger does not translate received metrics to Grapthite

I have configured monitoring system as bunch of next stuff:
my_app -> pystatsd -> statsdaemon -> carbon-tagger -> graphite (via carbon-cache) -> graph-explorer
But it looks like carbon-tagger does only dumping metrics to ElasticSearch but not to Graphite. In the same time carbon-tagger successfully send his internal metrics to carbon-cache and they appear in Graph Explorer well. I have look at the source code of the carbon-tagger and could not find place where it send any received from statsdaemon metrics to graphite. So now I'm confused! How should I configure my monitoring system to dump metrics both to the ElasticSearch and to the Graphite?
In a nutshell, correct configuration of described system should looks likes this:
That is, statsd/statsdaemon should pass in data to the carbon-relay (or carbon-relay-ng), not to the carbon-cache directly. And carbon-relay will broadcast data to the carbon-tagger and carbon-cache. Also, don't forget that carbon-tagger doesn't work with pickle format, while original carbon-relay produces data only through pickle protocol.

Resources