Kafka streams app not streaming/consuming data after idle time - apache-kafka-streams

I have a Kafka streams app with Processor API
enter co #Bean
#Primary
public KafkaStreams kafkaStreams() {
log.info("Create Kafka Stream Bean with defined topology");
Topology topology = this.buildTopology(new StreamsBuilder());
final KafkaStreams kafkaStreams = new KafkaStreams(topology, createConfigurationProperties());
kafkaStreams.cleanUp();
kafkaStreams.start();
return kafkaStreams;
}
private Topology buildTopology(StreamsBuilder streamsBuilder) {
Topology topology = streamsBuilder.build();
StoreBuilder<KeyValueStore<String, ParticipantObj>> stateStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("balance"), keySerializer, valuePSerializer);
StoreBuilder<KeyValueStore<String, Long>> lastSSNstateStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("lastSSN"), keySerializer, valueLongSerializer);
topology.addSource("Source", keyDeSerializer, valueDeSerializer, TopicA, TopicB, TopicC)
.addProcessor("Process", this::getKafkaStreamsNewProcessor, "Source")
.addStateStore(stateStoreBuilder, "Process")
.addStateStore(lastSSNstateStoreBuilder, "Process");
return topology;
}
I have a 3 source topics that I 'm streaming from. I have a processor, 2 state stores(in-memory) and NO sink node. However I do output some data on an outbound topic in the processor itself.
I was able to bring the app up and the data gets consumed for a while. After a while, when there is idle time (after 10 min of idle time) more data is coming from Topics A, B and C but the streams app becomes unresponsive and doesn't consume anything. There are no ERRORS in the logs.
I have enabled DEBUG logs. Still don't see anything in the logs. The app needs to consume data on a daily basis. But data actively comes through the Topics only in certain times of the day. Is there anything else I 'm missing.
Below is my streams config from logs.
acceptable.recovery.lag = 10000
application.id = streams-app1
application.server =
bootstrap.servers = XXXX
buffered.records.per.partition = 1000
built.in.metrics.version = latest
cache.max.bytes.buffering = 10485760
client.id =
commit.interval.ms = 30000
connections.max.idle.ms = 540000
default.deserialization.exception.handler = class org.apache.kafka.streams.errors.LogAndFailExceptionHandler
default.key.serde = class org.apache.kafka.common.serialization.Serdes$StringSerde
default.production.exception.handler = class org.apache.kafka.streams.errors.DefaultProductionExceptionHandler
default.timestamp.extractor = class org.theclearinghouse.chips.kafka.config.MessageTimestampExtractor
default.value.serde = class org.apache.kafka.common.serialization.Serdes$StringSerde
max.task.idle.ms = 0
max.warmup.replicas = 2
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
num.standby.replicas = 0
num.stream.threads = 1
partition.grouper = class org.apache.kafka.streams.processor.DefaultPartitionGrouper
poll.ms = 100
probing.rebalance.interval.ms = 600000
processing.guarantee = at_least_once
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
replication.factor = 1
request.timeout.ms = 40000
retries = 0
retry.backoff.ms = 100
rocksdb.config.setter = null
UPDATE: I do see the below message in the logs..
streams-app1-b3fecd76-3cf1-44ba-90fb-2e07389885c6-StreamThread-1-consumer-08e5aa6d-dedf-4935-a491-6ed96365c0ed sending LeaveGroup request to coordinator bXXXX(id: 2147483644 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2022-08-05 19:40:58,508 [kafka-coordinator-heartbeat-thread | streams-app1] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer clientId=streams-app1-b3fecd76-3cf1-44ba-90fb-2e07389885c6-StreamThread-1-consumer, groupId=streams-app1] Sending LEAVE_GROUP request with header RequestHeader(apiKey=LEAVE_GROUP, apiVersion=4, clientId=streams-ap

Related

SockJS increase pool size

I am using SocketJS and Stomp to send files over a backend api for being process.
My problem is that the upload function get stuck if more than two upload are done at the same time.
Ex:
User 1 -> upload a file -> backend is receiving correctly the file
User 2 -> upload a file -> backend is receiving correctly the file
User 3 -> upload a file -> the backend is not called until one of the
previous upload hasn't completed.
(after a minute User 1 complete its upload and the third upload starts)
The error I can see through the log is the following:
2021-06-28 09:43:34,884 INFO [MessageBroker-1] org.springframework.web.socket.config.WebSocketMessageBrokerStats.lambda$initLoggingTask$0: WebSocketSession[11 current WS(5)-HttpStream(6)-HttpPoll(0), 372 total, 26 closed abnormally (26 connect failure, 0 send limit, 16 transport error)], stompSubProtocol[processed CONNECT(302)-CONNECTED(221)-DISCONNECT(0)], stompBrokerRelay[null], **inboundChannel[pool size = 2, active threads = 2**, queued tasks = 263, completed tasks = 4481], outboundChannel[pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 607], sockJsScheduler[pool size = 1, active threads = 1, queued tasks = 14, completed tasks = 581444]
It seems clear that the pool size is full:
inboundChannel[pool size = 2, active threads = 2
but I really cannot find a way to increase the size.
This is the code:
Client side
ws = new SockJS(host + "/createTender");
stompClient = Stomp.over(ws);
Server side configuration
#EnableWebSocketMessageBroker
public class WebSocketBrokerConfig extends AbstractWebSocketMessageBrokerConfigurer {
...
...
#Override
public void configureWebSocketTransport(WebSocketTransportRegistration registration) {
registration.setMessageSizeLimit(100240 * 10240);
registration.setSendBufferSizeLimit(100240 * 10240);
registration.setSendTimeLimit(20000);
}
I've already tried with changing the configureWebSocketTransport parameters but it did not work.
How can I increase the pool size of the socket?
The inbound channel into the WebSocket can be overwritten by using this method:
#Override
public void configureClientInboundChannel(ChannelRegistration registration) {
registration.taskExecutor().corePoolSize(4);
registration.taskExecutor().maxPoolSize(4)
}
The official documentation suggests to have a pool size = number of cores. For sure, since the maxPoolSize is reached then requests are handled through an internal queue. So, given this configuration I can process concurrently 4 requests.

Spring Cloud stream "Found no committed offset"

I am using Spring Cloud Stream with Kafka binders. I could see that once the application starts it throws INFO level logs after every minute for all the input bindings configured in my application.
Configuration in the application.properties
spring.cloud.function.definition=consumeMessage
spring.cloud.stream.bindings.consumeMessage-in-0.destination=Kafka-stream
spring.cloud.stream.bindings.consumeMessage-in-0.group=Kafka-stream-consumer-group
And the logs are-
2021-06-25 11:26:51.329 INFO 89511 --- [pool-3-thread-3] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-Kafka-stream-consumer-group-5, groupId=Kafka-stream-consumer-group] Found no committed offset for partition Kafka-stream-0
Actually, this should not happen in my opinion because the auto-commit offset is enabled
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
Did I miss something in the configuration?
Looks like we're getting this log each time a metric is calculated. The suggestion is moving the class
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator to WARN level
Check this thread for more discussion. #garyrussell fyi

Data loss (skipping) using Flume with Kafka source and HDFS sink

I am experiencing data loss (skipping chunks of time in data) when I am pulling data off a kafka topic as a source and putting it into an HDFS file (DataStream) as a sink. The pattern seems to be in 10, 20 or 30 minute blocks of data skipping. I have verified that the skipped data is in the topic .log file that is being generated by Kafka. (The original data is coming from a syslog, going through a different flume agent and being put into the Kafka topic - the data loss isn't happening there).
I find it interesting and unusual that the blocks of skipped data are always 10, 20 or 30 mins and happen at least once an hour in my data.
Here is a copy of my configuration file:
a1.sources = kafka-source
a1.channels = memory-channel
a1.sinks = hdfs-sink
a1.sources.kafka-source.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.kafka-source.zookeeperConnect = 10.xx.x.xx:xxxx
a1.sources.kafka-source.topic = firewall
a1.sources.kafka-source.groupId = flume
a1.sources.kafka-source.channels = memory-channel
a1.channels.memory-channel.type = memory
a1.channels.memory-channel.capacity = 100000
a1.channels.memory-channel.transactionCapacity = 1000
a1.sinks.hdfs-sink.type = hdfs
a1.sinks.hdfs-sink.hdfs.fileType = DataStream
a1.sinks.hdfs-sink.hdfs.path = /topics/%{topic}/%m-%d-%Y
a1.sinks.hdfs-sink.hdfs.filePrefix = firewall1
a1.sinks.hdfs-sink.hdfs.fileSuffix = .log
a1.sinks.hdfs-sink.hdfs.rollInterval = 86400
a1.sinks.hdfs-sink.hdfs.rollSize = 0
a1.sinks.hdfs-sink.hdfs.rollCount = 0
a1.sinks.hdfs-sink.hdfs.maxOpenFiles = 1
a1.sinks.hdfs-sink.channel = memory-channel
Any insight would be helpful. I have been searching online for answers for awhile.
Thanks.

Veins: I added extra accident message but can not see the data exchange between nodes

I added extra event like accident in TraCiMobility.cc including the accident message but I can only see the data exchange between nodes for the accident message and can not see the data exchange between nodes with the new "icyroad" event.
When using only "icy road", I can see the data exchange between the nodes. Where should I add/edit in Veins so that I can see data exchange when both or more events are active?
The .ini is as follows:
*.node[*].veinsmobilityType = "org.car2x.veins.modules.mobility.traci.TraCIMobility"
*.node[*].mobilityType = "TraCIMobility"
*.node[*].mobilityType.debug = true
*.node[*].veinsmobilityType.debug = true
*.node[*].veinsmobility.x = 0
*.node[*].veinsmobility.y = 0
*.node[*].veinsmobility.z = 1.895
*.node[4].veinsmobility.accidentCount = 1
*.node[4].veinsmobility.accidentStart = 45s
*.node[4].veinsmobility.accidentDuration = 30s #30s
*.node[3].veinsmobility.icyroadCount = 1
*.node[3].veinsmobility.icyStart = 15s
*.node[3].veinsmobility.icyDuration = 30s
The sample simulation that is included in Veins 4a2 uses an application layer that only relays the first message it receives. You will need to change this to a more full-featured application layer to make this work with more than one event type.

Flume ElasticSearchSink does not consume all messages

I am using flume to process log lines to hdfs and log them into ElasticSearch using ElasticSearchSink.
Here is my configuration:
agent.channels.memory-channel.type = memory
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -4000 /home/cto/hs_err_pid11679.log
agent.sources.tail-source.channels = memory-channel
agent.sinks.log-sink.channel = memory-channel
agent.sinks.log-sink.type = logger
#####INTERCEPTORS
agent.sources.tail-source.interceptors = timestampInterceptor
agent.sources.tail-source.interceptors.timestampInterceptor.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
####SINK
# Setting the sink to HDFS
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path = hdfs://localhost:8020/data/flume/%y-%m-%d/
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.inUsePrefix =.
agent.sinks.hdfs-sink.hdfs.rollCount = 0
agent.sinks.hdfs-sink.hdfs.rollInterval = 0
agent.sinks.hdfs-sink.hdfs.rollSize = 10000000
agent.sinks.hdfs-sink.hdfs.idleTimeout = 10
agent.sinks.hdfs-sink.hdfs.writeFormat = Text
agent.sinks.elastic-sink.channel = memory-channel
agent.sinks.elastic-sink.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elastic-sink.hostNames = 127.0.0.1:9300
agent.sinks.elastic-sink.indexName = flume_index
agent.sinks.elastic-sink.indexType = logs_type
agent.sinks.elastic-sink.clusterName = elasticsearch
agent.sinks.elastic-sink.batchSize = 500
agent.sinks.elastic-sink.ttl = 5d
agent.sinks.elastic-sink.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
# Finally, activate.
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink elastic-sink
The problem is that I only see 1-2 messages in elastic using kibana and lots of messages in the hdfs files.
Any idea what I am missing here?
The problem is related to a bug in the Serializer.
if we drop the line:
agent.sinks.elastic-sink.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
the messages are consumed with no problem.
The problem is with the way the #timestamp field is created when using the serializer.

Resources