Spring Cloud stream "Found no committed offset" - spring

I am using Spring Cloud Stream with Kafka binders. I could see that once the application starts it throws INFO level logs after every minute for all the input bindings configured in my application.
Configuration in the application.properties
spring.cloud.function.definition=consumeMessage
spring.cloud.stream.bindings.consumeMessage-in-0.destination=Kafka-stream
spring.cloud.stream.bindings.consumeMessage-in-0.group=Kafka-stream-consumer-group
And the logs are-
2021-06-25 11:26:51.329 INFO 89511 --- [pool-3-thread-3] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-Kafka-stream-consumer-group-5, groupId=Kafka-stream-consumer-group] Found no committed offset for partition Kafka-stream-0
Actually, this should not happen in my opinion because the auto-commit offset is enabled
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [localhost:9092]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
Did I miss something in the configuration?

Looks like we're getting this log each time a metric is calculated. The suggestion is moving the class
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator to WARN level
Check this thread for more discussion. #garyrussell fyi

Related

Kafka streams app not streaming/consuming data after idle time

I have a Kafka streams app with Processor API
enter co #Bean
#Primary
public KafkaStreams kafkaStreams() {
log.info("Create Kafka Stream Bean with defined topology");
Topology topology = this.buildTopology(new StreamsBuilder());
final KafkaStreams kafkaStreams = new KafkaStreams(topology, createConfigurationProperties());
kafkaStreams.cleanUp();
kafkaStreams.start();
return kafkaStreams;
}
private Topology buildTopology(StreamsBuilder streamsBuilder) {
Topology topology = streamsBuilder.build();
StoreBuilder<KeyValueStore<String, ParticipantObj>> stateStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("balance"), keySerializer, valuePSerializer);
StoreBuilder<KeyValueStore<String, Long>> lastSSNstateStoreBuilder =
Stores.keyValueStoreBuilder(Stores.persistentKeyValueStore("lastSSN"), keySerializer, valueLongSerializer);
topology.addSource("Source", keyDeSerializer, valueDeSerializer, TopicA, TopicB, TopicC)
.addProcessor("Process", this::getKafkaStreamsNewProcessor, "Source")
.addStateStore(stateStoreBuilder, "Process")
.addStateStore(lastSSNstateStoreBuilder, "Process");
return topology;
}
I have a 3 source topics that I 'm streaming from. I have a processor, 2 state stores(in-memory) and NO sink node. However I do output some data on an outbound topic in the processor itself.
I was able to bring the app up and the data gets consumed for a while. After a while, when there is idle time (after 10 min of idle time) more data is coming from Topics A, B and C but the streams app becomes unresponsive and doesn't consume anything. There are no ERRORS in the logs.
I have enabled DEBUG logs. Still don't see anything in the logs. The app needs to consume data on a daily basis. But data actively comes through the Topics only in certain times of the day. Is there anything else I 'm missing.
Below is my streams config from logs.
acceptable.recovery.lag = 10000
application.id = streams-app1
application.server =
bootstrap.servers = XXXX
buffered.records.per.partition = 1000
built.in.metrics.version = latest
cache.max.bytes.buffering = 10485760
client.id =
commit.interval.ms = 30000
connections.max.idle.ms = 540000
default.deserialization.exception.handler = class org.apache.kafka.streams.errors.LogAndFailExceptionHandler
default.key.serde = class org.apache.kafka.common.serialization.Serdes$StringSerde
default.production.exception.handler = class org.apache.kafka.streams.errors.DefaultProductionExceptionHandler
default.timestamp.extractor = class org.theclearinghouse.chips.kafka.config.MessageTimestampExtractor
default.value.serde = class org.apache.kafka.common.serialization.Serdes$StringSerde
max.task.idle.ms = 0
max.warmup.replicas = 2
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
num.standby.replicas = 0
num.stream.threads = 1
partition.grouper = class org.apache.kafka.streams.processor.DefaultPartitionGrouper
poll.ms = 100
probing.rebalance.interval.ms = 600000
processing.guarantee = at_least_once
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
replication.factor = 1
request.timeout.ms = 40000
retries = 0
retry.backoff.ms = 100
rocksdb.config.setter = null
UPDATE: I do see the below message in the logs..
streams-app1-b3fecd76-3cf1-44ba-90fb-2e07389885c6-StreamThread-1-consumer-08e5aa6d-dedf-4935-a491-6ed96365c0ed sending LeaveGroup request to coordinator bXXXX(id: 2147483644 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2022-08-05 19:40:58,508 [kafka-coordinator-heartbeat-thread | streams-app1] DEBUG org.apache.kafka.clients.NetworkClient - [Consumer clientId=streams-app1-b3fecd76-3cf1-44ba-90fb-2e07389885c6-StreamThread-1-consumer, groupId=streams-app1] Sending LEAVE_GROUP request with header RequestHeader(apiKey=LEAVE_GROUP, apiVersion=4, clientId=streams-ap

Join KStream and GlobalKTable with Avro Object on Join key

I have a question regarding key deserialization on KafkaStreams. Specifically I use Kafka Connect and debezium connector to read
data from a Postgres table. Data were imported to a Kafka topic created two Avro schemas on Kafka Schema Registry one for the Key
and one for the Value (this contains all Columns on Table).
I read these data on a GlobalKTable like below:
properties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
properties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
GlobalKTable<my.namespace.db.Key, my.namespace.db.Value> tableData = builder.globalTable("topic_name");
My issue is that I have a topology where I need to join this GlobalKTable with a KStream as the one below:
SpecificAvroSerde<EventObj> eventsSpecificAvroSerde = new SpecificAvroSerde<>();
eventsSpecificAvroSerde.configure(Collections.singletonMap(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG,
conf.getString(" kafka.schema.registry.url")), false);
KStream<Integer, EventObj> events = builder.stream( "another_topic_name",Consumed.with(Serdes.Integer(),eventsSpecificAvroSerde))
Note that the Avro schema for my.namespace.db.Key is
{
"type": "record",
"name": "Key",
"namespace":"my.namespace.db",
"fields": [
{
"name": "id",
"type": "int"
}
]
}
Obviously the key on GlobalKTable and KStream is a different object and I do not know how to achieve the
join. I initially tried this but it did not work.
events.join(tableData,
(key,val) -> {return my.namespace.db.Key.newBuilder().setId(key).build();})
/* To convert the Integer Key in KStream to the Avro Object Key
on GlobalKTable as to achieve the join */
(ev,tData) -> ... );
The output I get is the following where I can see a WARN on one of my joined topics (which seems suspect) but there is nothing else no output of the joined entities, it just is as if there is nothing to consume.
INFO [Consumer clientId=kafka-streams-0401c29c-30a9-4969-93f9-5a83b3c834b4-StreamThread-1-consumer, groupId=kafka-streams] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:336)
INFO stream-thread [kafka-streams-0401c29c-30a9-4969-93f9-5a83b3c834b4-StreamThread-1-consumer] Assigned tasks to clients as {0401c29c-30a9-4969-93f9-5a83b3c834b4=[activeTasks: ([0_0]) standbyTasks: ([]) assignedTasks: ([0_0]) prevActiveTasks: ([]) prevAssignedTasks: ([]) capacity: 1]}. (org.apache.kafka.streams.processor.internals.StreamPartitionAssignor:341)
WARN [Consumer clientId=kafka-streams-0401c29c-30a9-4969-93f9-5a83b3c834b4-StreamThread-1-consumer, groupId=kafka-streams] The following subscribed topics are not assigned to any members: [my-topic] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:241)
INFO [Consumer clientId=kafka-streams-0401c29c-30a9-4969-93f9-5a83b3c834b4-StreamThread-1-consumer, groupId=kafka-streams] Successfully joined group with generation 1 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:341)
INFO [Consumer clientId=kafka-streams-0401c29c-30a9-4969-93f9-5a83b3c834b4-StreamThread-1-consumer, groupId=kafka-streams] Setting newly assigned partitions [mip-events-2-0] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:341)
INFO stream-thread [kafka-streams-0401c29c-30a9-4969-93f9-5a83b3c834b4-StreamThread-1] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED (org.apache.kafka.streams.processor.internals.StreamThread:346)
INFO KafkaAvroSerializerConfig values:
schema.registry.url = [http://kafka-schema-registry:8081]
auto.register.schemas = true
max.schemas.per.subject = 1000
(io.confluent.kafka.serializers.KafkaAvroSerializerConfig:175)
INFO KafkaAvroDeserializerConfig values:
schema.registry.url = [http://kafka-schema-registry:8081]
auto.register.schemas = true
max.schemas.per.subject = 1000
specific.avro.reader = true
(io.confluent.kafka.serializers.KafkaAvroDeserializerConfig:175)
INFO KafkaAvroSerializerConfig values:
schema.registry.url = [http://kafka-schema-registry:8081]
auto.register.schemas = true
max.schemas.per.subject = 1000
(io.confluent.kafka.serializers.KafkaAvroSerializerConfig:175)
INFO KafkaAvroDeserializerConfig values:
schema.registry.url = [http://kafka-schema-registry:8081]
auto.register.schemas = true
max.schemas.per.subject = 1000
specific.avro.reader = true
(io.confluent.kafka.serializers.KafkaAvroDeserializerConfig:175)
INFO stream-thread [kafka-streams-0401c29c-30a9-4969-93f9-5a83b3c834b4-StreamThread-1] partition assignment took 10 ms.
current active tasks: [0_0]
current standby tasks: []
previous active tasks: []
(org.apache.kafka.streams.processor.internals.StreamThread:351)
INFO stream-thread [kafka-streams-0401c29c-30a9-4969-93f9-5a83b3c834b4-StreamThread-1] State transition from PARTITIONS_ASSIGNED to RUNNING (org.apache.kafka.streams.processor.internals.StreamThread:346)
INFO stream-client [kafka-streams-0401c29c-30a9-4969-93f9-5a83b3c834b4]State transition from REBALANCING to RUNNING (org.apache.kafka.streams.KafkaStreams:346)
Can I make this join work on Kafka Streams?
Note that this works if I use a KTable to read the topic and use selectKey on
KStream to convert the key but I want to avoid the repartition.
Or should the right approach be importing my data from database in another way as to avoid creating Avro Objects and
how is this possible using debezium connectors and KafkaConnect with AvroConverter enable ?

changes to log4j2.properties file results in failure of elasticsearch

I have installed elasticsearch(6.6.0) and CentOS 7. I want to add somemore properties for rotating logs like if size is 50MB rotate and compress. But if i add any more configuration to /etc/elasticsearch/log4j2.properties file and restart the elasticsearch server, it fails.
My current log4j2.properties file:
status = error
# log action execution errors for easier debugging
logger.action.name = org.elasticsearch.action
logger.action.level = debug
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n
appender.rolling.type = RollingFile
appender.rolling.name = rolling
appender.rolling.fileName =
${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}]
%marker%.-10000m%n
appender.rolling.filePattern =
${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-
%d{yyyy-MM-dd}-%i.log.gz
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true
rootLogger.level = info
rootLogger.appenderRef.console.ref = console
rootLogger.appenderRef.rolling.ref = rolling
When i try to add, as it was given in elasticsearch documents this is how to add configurations,
appender.rolling.policies.size.size = 2MB
appender.rolling.strategy.action.condition.age = 3D
appender.rolling.strategy.action.type = Delete
appender.rolling.strategy.action.condition.type = IfFileName
It is failing with error :
Exception in thread "main" org.apache.logging.log4j.core.config.ConfigurationException: No type attribute provided for component size
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationBuilder.createComponent(PropertiesConfigurationBuilder.java:333)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationBuilder.processRemainingProperties(PropertiesConfigurationBuilder.java:347)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationBuilder.createComponent(PropertiesConfigurationBuilder.java:336)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationBuilder.processRemainingProperties(PropertiesConfigurationBuilder.java:347)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationBuilder.createAppender(PropertiesConfigurationBuilder.java:224)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationBuilder.build(PropertiesConfigurationBuilder.java:157)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationFactory.getConfiguration(PropertiesConfigurationFactory.java:56)
at org.apache.logging.log4j.core.config.properties.PropertiesConfigurationFactory.getConfiguration(PropertiesConfigurationFactory.java:35)
at org.apache.logging.log4j.core.config.ConfigurationFactory.getConfiguration(ConfigurationFactory.java:244)
at org.elasticsearch.common.logging.LogConfigurator$1.visitFile(LogConfigurator.java:105)
at org.elasticsearch.common.logging.LogConfigurator$1.visitFile(LogConfigurator.java:101)
at java.nio.file.Files.walkFileTree(Files.java:2670)
at org.elasticsearch.common.logging.LogConfigurator.configure(LogConfigurator.java:101)
at org.elasticsearch.common.logging.LogConfigurator.configure(LogConfigurator.java:84)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:316)
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123)
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114)
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67)
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122)
at org.elasticsearch.cli.Command.main(Command.java:88)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84)
And i have a warning in /var/log/elasticsearch/elasticsearch_deprecation.log :
[2018-02-20T02:09:32,694][WARN ][o.e.d.e.NodeEnvironment ] ES has detected the [path.data] folder using the cluster name as a folder [/data/es], Elasticsearch 6.0 will not allow the cluster name as a folder within the data path
Can anyone please explain how to add the configuration to log4j2.properties file ?
As the logs states that you are missing type attribute for size configuration. You are also missing type attribute for RolloverStrategy.
Try below configuration -
appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.rolling.policies.size.size = 2 MB
appender.rolling.strategy.type = DefaultRolloverStrategy
appender.rolling.strategy.action.type = Delete
appender.rolling.strategy.action.basePath = ${sys:es.logs.base_path}${sys:file.separator}
appender.rolling.strategy.action.maxDepth = 1
appender.rolling.strategy.action.ifLastModified.type = IfLastModified
appender.rolling.strategy.action.ifLastModified.age = 3d

Why is SourceConnectorConfig reported for sink connector?

I'm trying to create a Kafka sink connector using the spredfast s3 connector. However, for some reason, the log output is reporting a SourceConnectorConfig:
INFO ConnectorConfig values:
connector.class = com.spredfast.kafka.connect.s3.sink.S3SinkConnector
key.converter = null
name = transactions-s3-sink
tasks.max = 1
transforms = null
value.converter = class org.apache.kafka.connect.storage.StringConverter
(org.apache.kafka.connect.runtime.ConnectorConfig:180)
INFO Creating connector transactions-s3-sink of type com.spredfast.kafka.connect.s3.sink.S3SinkConnector (org.apache.kafka.connect.runtime.Worker:178)
INFO Instantiated connector transactions-s3-sink with version 0.0.1 of type class com.spredfast.kafka.connect.s3.sink.S3SinkConnector (org.apache.kafka.connect.runtime.Worker:181)
INFO Finished creating connector transactions-s3-sink (org.apache.kafka.connect.runtime.Worker:194)
INFO SourceConnectorConfig values:
connector.class = com.spredfast.kafka.connect.s3.sink.S3SinkConnector
key.converter = null
name = transactions-s3-sink
tasks.max = 1
transforms = null
value.converter = class org.apache.kafka.connect.storage.StringConverter
(org.apache.kafka.connect.runtime.SourceConnectorConfig:180)
INFO Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:824)
...
INFO Sink task WorkerSinkTask{id=transactions-s3-sink-0} finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:232)
Why is a SinkConnectorConfig reported yet further on in the log output I can see a WorkerSinkTask was created?
The reason is that this connector extends Connector abstract class instead of SinkConnector abstract class from Connect's API (see the source code here).
Thus, Connect framework can't tell whether this connector is a source or a sink, and currently the logic in the code is that if it's not a sink, assume it's a source. That's why you experience this inconsistency.
The solution is for the connector to extend appropriate abstract class (here org.apache.kafka.connect.sink.SinkConnector)

How to test log rolling and deletion in elasticsearch?

I am using below configuration taken from Elasticsearch doc. Instead of waiting for 7D or a day, how can I test this immediately?
Below is my log4j2.properties file
...
appender.deprecation_rolling.type = RollingFile
appender.deprecation_rolling.name = deprecation_rolling
appender.deprecation_rolling.fileName = ${sys:es.logs}_deprecation.log
appender.deprecation_rolling.layout.type = PatternLayout
appender.deprecation_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%.10000m%n
appender.deprecation_rolling.filePattern = ${sys:es.logs}_deprecation-%i.log.gz
appender.deprecation_rolling.policies.type = Policies
appender.deprecation_rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.deprecation_rolling.policies.size.size = 1GB
appender.deprecation_rolling.strategy.type = DefaultRolloverStrategy
appender.deprecation_rolling.strategy.max = 4
logger.deprecation.name = org.elasticsearch.deprecation
logger.deprecation.level = warn
logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_rolling
logger.deprecation.additivity = false
...
appender.rolling.strategy.type = DefaultRolloverStrategy
appender.rolling.strategy.action.type = Delete
appender.rolling.strategy.action.basepath = ${sys:es.logs.base_path}
appender.rolling.strategy.action.condition.type = IfLastModified
appender.rolling.strategy.action.condition.age = 1D
appender.rolling.strategy.action.PathConditions.type = IfFileName
appender.rolling.strategy.action.PathConditions.glob = ${sys:es.logs.cluster_name}-*
Note: I am using elasticsearch 5.0.1
Update: I do not want to wait for a day 1D to test if the log files are being deleted or not. How can I test with 10 minute or so to test this scenario? Something like rolling happens every 1 minute and deletion happens for logs older than 10 minutes.
Yes, there is a way.
Actually, I use a size triggering policy to force or cause a Deletion policy and so test if my log4j2.properties works or not.
This an example of our log4j2.properties file, I highlight in black the change.
appender.rolling.type = RollingFile
appender.rolling.name = rolling
appender.rolling.fileName = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}.log
appender.rolling.layout.type = PatternLayout
appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%.-10000m%n
appender.rolling.filePattern = ${sys:es.logs.base_path}${sys:file.separator}${sys:es.logs.cluster_name}-%d{yyyy-MM-dd}-%i.log.gz
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true
**appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.rolling.policies.size.size = 100KB**
And then, I change Debug Logging Level on ElasticSearch.
PUT /_cluster/settings
{"transient":{"logger._root":"DEBUG"}}
In that way, I'm causing many logs and triggering the RollingFile Appender with its regarding actions.
So, you can check quickly your log4j2.properties file without to wait 24h.
When you want to stop your test, you must set the default value:
PUT /_cluster/settings
{"transient":{"logger._root":"ERROR"}}
Regards

Resources