Failed to deserialize data for topic to avro sink connector - apache-kafka-connect

i'm using confluent and connect rdbms to kafka, and it works.
some ETL with KSQL works to. But when i want to sink my stream/table back to rdbms, now there is a problem.
checking my topic/stream if there is avro :
./bin/kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --from-beginning --max-messages 1 --topic STR_VAHG_REKEY_02 | jq '.'
result:
{
"REKEY02": {
"long": 3941641584970777000
}
}
seems works fine, but when i sink it :
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler\n\
tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)\n\
tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)\n\
tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:484)\n\
tat org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)\n\
tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)\n\
tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)\n\
tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)\n\
tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)\n\
tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)\n\
tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\
tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\
tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\
tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\
tat java.lang.Thread.run(Thread.java:748)\n
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic STR_VAHG_REKEY_02 to Avro: \n\
tat io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:110)\n\
tat org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$0(WorkerSinkTask.java:484)\n\
tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)\n\
tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)\n\t...
13 more\n
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1\n
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!\n"
here is my sink file:
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
tasks.max=1
topics=STR_VAHG_REKEY_02
connection.url=jdbc:mysql://
connection.user=
connection.password=
auto.create=true
timestamp.column.name=create_at
validate.non.null=false
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
im lost already, any suggestion would be nice

Your avro-console-consumer output shows correctly that the value is Avro - but ksqlDB writes message keys as a string, not Avro.
Replace these two lines
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
with
key.converter=org.apache.kafka.connect.storage.StringConverter

I was also having the same issue. In my case, I was sending plain-text and AVRO messages on the same topic, so that's why consumer was not able to deserialize the data because the consumer was thinking to get only AVRO messages but it was receiving both(AVRO and normal messages).
How I fixed this: I deleted the topic and send only the AVRO messages and my issue got fixed.

Related

RocksDB exception in Kafka Streams with kafka_2.13-3.2.0 running on Window

I have kafka 2.13-3.2.0 running on Window machine. I am trying stream Join operation and getting following error. I can see same issue fixed in 1.0.1 version as per : https://issues.apache.org/jira/browse/KAFKA-6162
But I am still getting this error with kafka 2.13-3.2.0.
Error Logs:
Caused by: org.rocksdb.RocksDBException: Failed to create dir: D:\tmp\kafka-streams\join_driver_application\1_0\KSTREAM-JOINTHIS-0000000014-store\KSTREAM-JOINTHIS-0000000014-store:1661385600000: Invalid argument
at org.rocksdb.RocksDB.open(Native Method)
at org.rocksdb.RocksDB.open(RocksDB.java:231)
at org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:197)
... 23 more

Unable to convert to timestamp using Kafka timestampconvert

I am using Kafka source JDBC connector to pull DB events and I am running the kafka connect in standalone mode. When I run this file, I get the shown error. Please help me.
Code:
name=sailpointdb01107
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.password = xxxxx
connection.url = jdbc:oracle:thin:#xxxxx:1521/xxxxx
connection.user =xxxxx
query= SELECT * FROM (SELECT NAME, TO_TIMESTAMP('19700101', 'YYYYMMDD')+ NUMTODSINTERVAL(COMPLETED/1000,'SECOND') AS TASKFAILEDON FROM task WHERE COMPLETION_STATUS='Error')
mode= timestamp
timestamp.column.name=TASKFAILEDON
topic.prefix=testing
validate.non.null=false
transforms=TimestampConverter
transforms.TimestampConverter.type=org.apache.kafka.connect.transforms.TimestampConverter$Value
transforms.TimestampConverter.format=yyyy-MM-dd
transforms.TimestampConverter.target.type=Timestamp
transforms.TimestampConverter.target.field=TASKFAILEDON
Error:
Error[2019-10-01 15:17:45,058] ERROR WorkerSourceTask{id=sailpointdb01107-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
Caused by: org.apache.kafka.connect.errors.ConnectException: Schema Schema{STRUCT} does not correspond to a known timestamp type format
at org.apache.kafka.connect.transforms.TimestampConverter.timestampTypeFromSchema(TimestampConverter.java:406)
at org.apache.kafka.connect.transforms.TimestampConverter.applyWithSchema(TimestampConverter.java:334)
at org.apache.kafka.connect.transforms.TimestampConverter.apply(TimestampConverter.java:275)
at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 11 more
[2019-10-01 15:17:45,059] ERROR WorkerSourceTask{id=sailpointdb01107-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask)
This line configured at connector could avoid that issue.
time.precision.mode: "connect"

internal.S3AbortableInputStream on hadoop fs -get s3 to EMR

When I ssh onto an EMR cluster and do the following command:
hadoop fs -get s3://path/to/my/files
I am getting the following error, and the file transfer fails partway through. I have used this command in the past, so I'm not sure what's up. Could it be related to the files' encryption? What would cause the stream to consistently close?
WARN internal.S3AbortableInputStream: Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
Exception in thread "main" org.apache.hadoop.fs.FSError: java.io.IOException: Stream Closed
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:253)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:74)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:108)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:261)
at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:478)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:395)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:328)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:263)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:248)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
at org.apache.hadoop.fs.shell.CommandWithDestination.recursePath(CommandWithDestination.java:291)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
at org.apache.hadoop.fs.shell.CommandWithDestination.recursePath(CommandWithDestination.java:291)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:319)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
Caused by: java.io.IOException: Stream Closed
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:251)
... 30 more
My best guess: Not enough space on the cluster for the files.

Increase concurrency on supported queue of topic:channel

I was testing on SpringXD 1.3.0.RELEASE the duplication of messages to different sinks. My configuration is a three node cluster supported by RabbitMQ as message bus.
My test was something like this:
First Case
stream create sourceToDuplicate --definition "trigger --fixedDelay=1
--timeUnit=MILLISECONDS --payload='test' > topic:test" --deploy
stream create processMessages1 --definition "topic:test > cassandra --initScript=file:<absolut-path-to>/int-db.cql --ingestQuery='insert into book (isbn, title, author) values (uuid(), ?, ?)'"
stream create processMessages2 --definition "topic:test > aggregator --count=1000 --timeout=1000 | file" --deploy
Now in order to increase the consumer on the cassandra-sink, I want to deploy the first stream with "module.cassandra.consumer.concurrency=10". This property let fail the deployment.
My workaround is now a fourth stream, so that I can increase the consumers:
Second Case
stream create topicToQueue1 --definition "topic:test > queue:test1" --deploy
stream create processMessage1 --definition "queue:test1 > cassandra..."
stream deploy processMessage1 --properties "module.cassandra.consumer.concurrency=10"
Finally my question: Why should the first use case fail if there is on rabbitmq already a queue added for the topic:channel where more consumers are allowed?
Merry Christmas to everyone
--- Update ---
Version: SpringXD 1.3.0.RELEASE
Error:
2015-12-18T13:58:28+0100 1.3.0.RELEASE INFO DeploymentSupervisor-0
zk.ZKStreamDeploymentHandler - Deployment status for stream 'processMessage1':
DeploymentStatus{state=failed,error(s)=java.lang.IllegalArgumentException:
RabbitMessageBus does not support consumer property: concurrency for processMessage1.topic:test.
at org.springframework.xd.dirt.integration.bus.MessageBusSupport.validateProperties(MessageBusSupport.java:786)
at org.springframework.xd.dirt.integration.bus.MessageBusSupport.validateConsumerProperties(MessageBusSupport.java:757)
at org.springframework.xd.dirt.integration.rabbit.RabbitMessageBus.bindPubSubConsumer(RabbitMessageBus.java:472)
at org.springframework.xd.dirt.plugins.AbstractMessageBusBinderPlugin.bindMessageConsumer(AbstractMessageBusBinderPlugin.java:275)
at org.springframework.xd.dirt.plugins.AbstractMessageBusBinderPlugin.bindConsumerAndProducers(AbstractMessageBusBinderPlugin.java:155)
at org.springframework.xd.dirt.plugins.stream.StreamPlugin.postProcessModule(StreamPlugin.java:73)
at org.springframework.xd.dirt.module.ModuleDeployer.postProcessModule(ModuleDeployer.java:238)
at org.springframework.xd.dirt.module.ModuleDeployer.doDeploy(ModuleDeployer.java:218)
at org.springframework.xd.dirt.module.ModuleDeployer.deploy(ModuleDeployer.java:200)
at org.springframework.xd.dirt.server.container.DeploymentListener.deployModule(DeploymentListener.java:365)
at org.springframework.xd.dirt.server.container.DeploymentListener.deployStreamModule(DeploymentListener.java:334)
at org.springframework.xd.dirt.server.container.DeploymentListener.onChildAdded(DeploymentListener.java:181)
at org.springframework.xd.dirt.server.container.DeploymentListener.childEvent(DeploymentListener.java:149)
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:509)
at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:503)
at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)
at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)
at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:500)
at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35)
at org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run(PathChildrenCache.java:762)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
You can't have concurrency > 1 on a topic: named channel - otherwise each thread will get a copy of the message.
If you want to use concurrency on a named channel, it has to be a queue: so each thread competes for messages.

spring XD rabbit source module fails to process messages, first message stays unacknowledged

I am trying simple spring XD application to load log events in HDFS. I have configured the target application with the spring-ampq/rabbit log4j appender (the org.springframework.amqp.rabbit.log4j.AmqpAppender Class) to pump log messages to a pre-configured exchange. I set the following stream to pull those messages from and push them to HDFS, where both soruce and sink modules are off-the-shelf XD modules,
stream definition,
xd:>stream create --name demoQ1 --definition "rabbit | hdfs --rollover=15 --directory=/user/root" --deploy
Created and deployed new stream 'demoQ1'
xd:>stream list
Stream Name Stream Definition Status
----------- -------------------------------------------------- --------
demoQ1 rabbit | hdfs --rollover=15 --directory=/user/root deployed
AMQP Appender is publishing the messages to exchange and routing it to demoQ1 queue, where rabbit source is picking up the first message and then gets stuck, as it does not acknowledge the message. What could be the reason?
In your container log, do you see this: "failed to write Message payload to HDFS" ?
If so, then you need to use the type conversion between modules. From the rabbit source to hdfs sink the messages will simply be byte arrays.
Your stream definition could be,
stream create --name demoQ1 --definition "rabbit --outputType=text/plain | hdfs --rollover=15 --directory=/user/root" --deploy
or,
stream create --name demoQ1 --definition "rabbit | hdfs --inputType=text/plain --rollover=15 --directory=/user/root" --deploy
Note the outputType or the inputType option in source/sink respectively.
In this case, the hdfs sink's HdfsStoreMessageHandler expects the payload to be of type String.
For more details on the type conversion, please check this out:
https://github.com/spring-projects/spring-xd/wiki/Type-Conversion
Enabled debug logs on the spring XD container running rabbit module, It showed following exception repeatedly happening for the first message and message is requeued back thus the message stays in unacknowledged state and rabbit source can not process further messages..
To resolve the problem, from log4j Appender properties I removed this property, log4j.appender.amqp.contentEncoding=null. This property explicitly specifies name of the encoder as "null", which seems to be a bug. I was expecting null means no encoder specified :)
Exception in the log, continuously repeating as message is rejected and re-queued back..
19:29:17,713 DEBUG SimpleAsyncTaskExecutor-1 listener.BlockingQueueConsumer:268 - Received message: (Body:'Hello'MessageProperties [headers={categoryName=org.apache.hadoop.yarn.server.nodemanager.NodeManager, level=INFO}, timestamp=Sat Apr 19 19:21:52 PDT 2014, messageId=null, userId=null, appId=NodeManager, clusterId=null, type=null, correlationId=null, replyTo=null, contentType=text/plain, contentEncoding=null, contentLength=0, deliveryMode=PERSISTENT, expiration=null, priority=0, redelivered=true, receivedExchange=test-exch, receivedRoutingKey=rk1, deliveryTag=184015, messageCount=0]) 19:29:17,715 WARN SimpleAsyncTaskExecutor-1 listener.SimpleMessageListenerContainer:530 - Execution of Rabbit message listener failed, and no ErrorHandler has been set. org.springframework.amqp.rabbit.listener.ListenerExecutionFailedException: Listener threw exception at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.wrapToListenerExecutionFailedExceptionIfNeeded(AbstractMessageListenerContainer.java:751) at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:690) at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:583) at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$001(SimpleMessageListenerContainer.java:75) at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$1.invokeListener(SimpleMessageListenerContainer.java:154) at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.invokeListener(SimpleMessageListenerContainer.java:1111) at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:556) at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.doReceiveAndExecute(SimpleMessageListenerContainer.java:904) at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.receiveAndExecute(SimpleMessageListenerContainer.java:888) at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$500(SimpleMessageListenerContainer.java:75) at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:989) at java.lang.Thread.run(Thread.java:722) Caused by: org.springframework.amqp.support.converter.MessageConversionException: failed to convert text-based Message content at org.springframework.amqp.support.converter.SimpleMessageConverter.fromMessage(SimpleMessageConverter.java:100) at org.springframework.integration.amqp.inbound.AmqpInboundChannelAdapter$1.onMessage(AmqpInboundChannelAdapter.java:73) at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:688) ... 10 more Caused by: java.io.UnsupportedEncodingException: null at java.lang.StringCoding.decode(StringCoding.java:190) at java.lang.String.(String.java:416) at java.lang.String.(String.java:481) at org.springframework.amqp.support.converter.SimpleMessageConverter.fromMessage(SimpleMessageConverter.java:97) ... 12 more 19:29:17,715 DEBUG SimpleAsyncTaskExecutor-1 listener.BlockingQueueConsumer:657 - Rejecting messages (requeue=true)

Resources